[jira] [Created] (HIVE-7908) CBO: Handle Windowing functions part of expressions
Laljo John Pullokkaran created HIVE-7908: Summary: CBO: Handle Windowing functions part of expressions Key: HIVE-7908 URL: https://issues.apache.org/jira/browse/HIVE-7908 Project: Hive Issue Type: Bug Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24688: parallel order by clause on a string column fails with IOException: Split points are out of order
On Aug. 28, 2014, 6:05 a.m., Szehon Ho wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1040 https://reviews.apache.org/r/24688/diff/3/?file=669965#file669965line1040 Yep, thats what I meant. I think this option seemed not useful. Any bigger number than one reducer, which is default for order-by, will make better performance, then why don't we try with that? - Navis --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24688/#review51747 --- On Aug. 27, 2014, 2:18 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24688/ --- (Updated Aug. 27, 2014, 2:18 a.m.) Review request for hive. Bugs: HIVE-7669 https://issues.apache.org/jira/browse/HIVE-7669 Repository: hive-git Description --- The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java 6c22362 ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java 166461a ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java ef72039 ql/src/test/org/apache/hadoop/hive/ql/exec/TestPartitionKeySampler.java PRE-CREATION Diff: https://reviews.apache.org/r/24688/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-7857) Hive query fails after Tez session times out
[ https://issues.apache.org/jira/browse/HIVE-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114901#comment-14114901 ] Hive QA commented on HIVE-7857: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665141/HIVE-7857.2.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 6127 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_stats_counter org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/552/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/552/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-552/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665141 Hive query fails after Tez session times out Key: HIVE-7857 URL: https://issues.apache.org/jira/browse/HIVE-7857 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-7857.1.patch, HIVE-7857.2.patch Originally reported by [~deepesh] Steps to reproduce: Open the Hive CLI, ensure that HIVE_AUX_JARS_PATH has hcatalog-core.jar in the path. Keep it idle for more than 5 minutes (this is the default tez session timeout). Essentially Tez session should time out. Run a Hive on Tez query, the query fails. Here is a sample CLI session: {noformat} hive select from_unixtime(unix_timestamp(), dd-MMM-) from vectortab10korc limit 1; Query ID = hrt_qa_20140626002525_6e964079-4031-406b-85ed-cda9c65dca22 Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established. Status: Running (application id: application_1403688364015_1930) Map 1: -/- Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1403688364015_1930_1_00, diagnostics=[Task failed, taskId=task_1403688364015_1930_1_00_00, diagnostics=[AttemptID:attempt_1403688364015_1930_1_00_00_0 Info:Container container_1403688364015_1930_01_02 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_1 Info:Container container_1403688364015_1930_01_03 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_2 Info:Container container_1403688364015_1930_01_04 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ], AttemptID:attempt_1403688364015_1930_1_00_00_3 Info:Container container_1403688364015_1930_01_05 COMPLETED with diagnostics set to [Resource hdfs://ambari-sec-1403670773-others-2-1.cs1cloud.internal:8020/tmp/hive-hrt_qa/_tez_session_dir/3d3ef758-90f3-4bb3-86cb-902aeb3b8830/hive-hcatalog-core-0.13.0.2.1.3.0-554.jar changed on src filesystem (expected 1403741969169, was 1403742347351 ]], Vertex failed as one or more tasks failed. failedTasks:1] DAG failed due to vertex failure. failedVertices:1 killedVertices:0 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask {noformat} -- This message
[jira] [Commented] (HIVE-7497) Fix some default values in HiveConf
[ https://issues.apache.org/jira/browse/HIVE-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114902#comment-14114902 ] Dong Chen commented on HIVE-7497: - [~vgumashta] Thanks for taking care of it. I'm ok with it and please go ahead. Thanks :) Fix some default values in HiveConf --- Key: HIVE-7497 URL: https://issues.apache.org/jira/browse/HIVE-7497 Project: Hive Issue Type: Task Reporter: Brock Noland Assignee: Dong Chen Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7497.1.patch, HIVE-7497.patch HIVE-5160 resolves an env variable at runtime via calling System.getenv(). As long as the variable is not defined when you run the build null is returned and the path is not placed in the hive-default,template. However if it is defined it will populate hive-default.template with a path which will be different based on the user running the build. We should use $\{system:HIVE_CONF_DIR\} instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24472: HIVE-7649: Support column stats with temporary tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24472/ --- (Updated Aug. 29, 2014, 6:13 a.m.) Review request for hive and Prasanth_J. Changes --- Addressing review feedback from Prasanth Bugs: HIVE-7649 https://issues.apache.org/jira/browse/HIVE-7649 Repository: hive-git Description --- Update SessionHiveMetastoreClient to get column stats to work for temp tables. Diffs (updated) - metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 51c3f2c ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 4cf98d8 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 3f8648b ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 9798cf3 ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 7cb7c5e ql/src/test/queries/clientnegative/temp_table_column_stats.q 9b7aa4a ql/src/test/queries/clientpositive/temp_table_display_colstats_tbllvl.q PRE-CREATION ql/src/test/results/clientnegative/temp_table_column_stats.q.out 486597a ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24472/diff/ Testing --- Thanks, Jason Dere
[jira] [Updated] (HIVE-7908) CBO: Handle Windowing functions part of expressions
[ https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7908: - Attachment: HIVE-7908.patch CBO: Handle Windowing functions part of expressions --- Key: HIVE-7908 URL: https://issues.apache.org/jira/browse/HIVE-7908 Project: Hive Issue Type: Bug Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7908.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7649) Support column stats with temporary tables
[ https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7649: - Attachment: HIVE-7649.4.patch patch v4, changes per review comments from Prasanth Support column stats with temporary tables -- Key: HIVE-7649 URL: https://issues.apache.org/jira/browse/HIVE-7649 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch, HIVE-7649.3.patch, HIVE-7649.4.patch Column stats currently not supported with temp tables, see if they can be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7908) CBO: Handle Windowing functions part of expressions
[ https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-7908: - Status: Patch Available (was: Open) CBO: Handle Windowing functions part of expressions --- Key: HIVE-7908 URL: https://issues.apache.org/jira/browse/HIVE-7908 Project: Hive Issue Type: Bug Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7908.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114919#comment-14114919 ] Szehon Ho commented on HIVE-7775: - Hi Chengxiang, sorry do you mind opening a new JIRA as this one is already resolved? Its one JIRA per commit. enable sample8.q.[Spark Branch] --- Key: HIVE-7775 URL: https://issues.apache.org/jira/browse/HIVE-7775 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, HIVE-7775.3-spark.additional.patch sample8.q contain join query, should enable this qtest after hive on spark support join operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24688: parallel order by clause on a string column fails with IOException: Split points are out of order
On Aug. 28, 2014, 6:05 a.m., Szehon Ho wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1040 https://reviews.apache.org/r/24688/diff/3/?file=669965#file669965line1040 Yep, thats what I meant. Navis Ryu wrote: I think this option seemed not useful. Any bigger number than one reducer, which is default for order-by, will make better performance, then why don't we try with that? You mean get rid of error check? I was just trying to make this option easier to user, if we aren't going to expose it I'm ok with that. - Szehon --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24688/#review51747 --- On Aug. 27, 2014, 2:18 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24688/ --- (Updated Aug. 27, 2014, 2:18 a.m.) Review request for hive. Bugs: HIVE-7669 https://issues.apache.org/jira/browse/HIVE-7669 Repository: hive-git Description --- The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7f4afd9 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java 6c22362 ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java 166461a ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java ef72039 ql/src/test/org/apache/hadoop/hive/ql/exec/TestPartitionKeySampler.java PRE-CREATION Diff: https://reviews.apache.org/r/24688/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-7907) Bring up tez branch to changes in TEZ-1038, TEZ-1500
[ https://issues.apache.org/jira/browse/HIVE-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114925#comment-14114925 ] Gunther Hagleitner commented on HIVE-7907: -- +1 Bring up tez branch to changes in TEZ-1038, TEZ-1500 Key: HIVE-7907 URL: https://issues.apache.org/jira/browse/HIVE-7907 Project: Hive Issue Type: Sub-task Components: Tez Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-7907.1-tez.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Na Yang updated HIVE-7870: -- Attachment: HIVE-7870.3-spark.patch Insert overwrite table query does not generate correct task plan [Spark Branch] --- Key: HIVE-7870 URL: https://issues.apache.org/jira/browse/HIVE-7870 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Na Yang Labels: Spark-M1 Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, HIVE-7870.3-spark.patch Insert overwrite table query does not generate correct task plan when hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. {noformat} set hive.optimize.union.remove=true set hive.merge.sparkfiles=true insert overwrite table outputTbl1 SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b; select * from outputTbl1 order by key, values; {noformat} query result {noformat} 1 1 1 2 2 1 2 2 3 1 3 2 7 1 7 2 8 2 8 2 8 2 {noformat} expected result: {noformat} 1 1 1 1 1 2 2 1 2 1 2 2 3 1 3 1 3 2 7 1 7 1 7 2 8 1 8 1 8 2 8 2 8 2 {noformat} Move work is not working properly and some data are missing during move. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7907) Bring up tez branch to changes in TEZ-1038, TEZ-1500
[ https://issues.apache.org/jira/browse/HIVE-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114933#comment-14114933 ] Gopal V commented on HIVE-7907: --- Looks like this has to wait till the 0.5.0-SNAPSHOT gets updated on the apache snapshots. Bring up tez branch to changes in TEZ-1038, TEZ-1500 Key: HIVE-7907 URL: https://issues.apache.org/jira/browse/HIVE-7907 Project: Hive Issue Type: Sub-task Components: Tez Affects Versions: tez-branch Reporter: Gopal V Assignee: Gopal V Attachments: HIVE-7907.1-tez.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/ --- (Updated Aug. 29, 2014, 6:44 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-7870 https://issues.apache.org/jira/browse/HIVE-7870 Repository: hive-git Description --- HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch] The cause of this problem is during spark/tez task generation, the union file sink operator are cloned to two new filesink operator. The linkedfilesinkdesc info for those new filesink operators are missing. In addition, the two new filesink operators also need to be linked together. Diffs - itests/src/test/resources/testconfiguration.properties 6393671 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b ql/src/test/results/clientpositive/spark/union10.q.out 20c681e ql/src/test/results/clientpositive/spark/union18.q.out 3f37a0a ql/src/test/results/clientpositive/spark/union19.q.out 6922fcd ql/src/test/results/clientpositive/spark/union28.q.out 8bd5218 ql/src/test/results/clientpositive/spark/union29.q.out b9546ef ql/src/test/results/clientpositive/spark/union3.q.out 3ae6536 ql/src/test/results/clientpositive/spark/union30.q.out 12717a1 ql/src/test/results/clientpositive/spark/union33.q.out b89757f ql/src/test/results/clientpositive/spark/union4.q.out 6341cd9 ql/src/test/results/clientpositive/spark/union6.q.out 263d9f4 ql/src/test/results/clientpositive/spark/union_remove_spark_1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_10.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_11.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_15.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_16.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_17.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_18.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_19.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_20.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_21.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_24.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_25.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_4.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_5.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_6.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_7.q.out PRE-CREATION
Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/ --- (Updated Aug. 29, 2014, 6:44 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Changes --- 1. add .q.out for TestCliDriver test for all new spark .q tests 2. update existing .q.out files because of plan change Bugs: HIVE-7870 https://issues.apache.org/jira/browse/HIVE-7870 Repository: hive-git Description --- HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch] The cause of this problem is during spark/tez task generation, the union file sink operator are cloned to two new filesink operator. The linkedfilesinkdesc info for those new filesink operators are missing. In addition, the two new filesink operators also need to be linked together. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 6393671 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b ql/src/test/results/clientpositive/spark/union10.q.out 20c681e ql/src/test/results/clientpositive/spark/union18.q.out 3f37a0a ql/src/test/results/clientpositive/spark/union19.q.out 6922fcd ql/src/test/results/clientpositive/spark/union28.q.out 8bd5218 ql/src/test/results/clientpositive/spark/union29.q.out b9546ef ql/src/test/results/clientpositive/spark/union3.q.out 3ae6536 ql/src/test/results/clientpositive/spark/union30.q.out 12717a1 ql/src/test/results/clientpositive/spark/union33.q.out b89757f ql/src/test/results/clientpositive/spark/union4.q.out 6341cd9 ql/src/test/results/clientpositive/spark/union6.q.out 263d9f4 ql/src/test/results/clientpositive/spark/union_remove_spark_1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_10.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_11.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_15.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_16.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_17.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_18.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_19.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_20.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_21.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_24.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_25.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_3.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_4.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_5.q.out PRE-CREATION
Re: Review Request 15449: session/operation timeout for hiveserver2
On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote: Navis Ryu wrote: Addressing previous comments, I've revised validator to describe itself to description. For StringSet validator, the description of the conf will be started with something like, Expects one of [textfile, sequencefile, rcfile, orc]. and for TimeValidator, it's Expects a numeric value with timeunit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), etc. It's the reason why some part of description is removed. Could you generate the template and see the result? (cd commmon;mvn clean package -Phadoop-2 -Pdist -DskipTests). If you don't like this, I'll revert that. Navis, that is cool to the nth degree! I applied patch 15, generated a template file, and checked each parameter changed by the patch. All the Expects phrases look great. However, non-numeric values are lowercase. For example, hive.exec.orc.encoding.strategy used to say the values are SPEED and COMPRESSION, but now it's Expects one of [speed, compression]. Are all parameter values case-insensitive? If so, the Configuration Properties Configuration docs should mention it. Two parameters still give units in their descriptions, although that seems to be deliberate: - hive.server2.long.polling.timeout: Time in milliseconds that HiveServer2 will wait, ... (has a non-zero default value, in milliseconds) - hive.support.quoted.identifiers: Whether to use quoted identifier. 'none' or 'column' can be used. (goes on to explain what 'none' and 'column' mean) - Lefty --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/#review51760 --- On Aug. 28, 2014, 2:31 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Aug. 28, 2014, 2:31 a.m.) Review request for hive. Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs - common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 2867301 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 84e1c7e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 4e5f595 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 39d2184 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 17c1c7b service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 Diff: https://reviews.apache.org/r/15449/diff/ Testing --- Confirmed in the local environment. Thanks, Navis Ryu
Re: Review Request 15449: session/operation timeout for hiveserver2
On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1523 https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1523 Please restore (in seconds) to description and specify other time units that can be used, if any. Not an issue -- my mistake. On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1529 https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1529 Please restore (in seconds) to description and specify other time units that can be used, if any. Not an issue -- my mistake. On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1601 https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1601 Please add time unit information: Accepts time units like d/h/m/s/ms/us/ns. Not an issue -- my mistake. On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1604 https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1604 Please add time unit information: Accepts time units like d/h/m/s/ms/us/ns. Not an issue -- my mistake. On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 1607 https://reviews.apache.org/r/15449/diff/10/?file=670860#file670860line1607 Please add time unit information: Accepts time units like d/h/m/s/ms/us/ns. Not an issue -- my mistake. - Lefty --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/#review51760 --- On Aug. 28, 2014, 2:31 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Aug. 28, 2014, 2:31 a.m.) Review request for hive. Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs - common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 2867301 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 84e1c7e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 4e5f595 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 39d2184 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 17c1c7b service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 Diff: https://reviews.apache.org/r/15449/diff/ Testing --- Confirmed in the local environment. Thanks, Navis Ryu
Re: Review Request 24472: HIVE-7649: Support column stats with temporary tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24472/#review51875 --- Ship it! Ship It! - Prasanth_J On Aug. 29, 2014, 6:13 a.m., Jason Dere wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24472/ --- (Updated Aug. 29, 2014, 6:13 a.m.) Review request for hive and Prasanth_J. Bugs: HIVE-7649 https://issues.apache.org/jira/browse/HIVE-7649 Repository: hive-git Description --- Update SessionHiveMetastoreClient to get column stats to work for temp tables. Diffs - metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 51c3f2c ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 4cf98d8 ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 3f8648b ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java 9798cf3 ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 7cb7c5e ql/src/test/queries/clientnegative/temp_table_column_stats.q 9b7aa4a ql/src/test/queries/clientpositive/temp_table_display_colstats_tbllvl.q PRE-CREATION ql/src/test/results/clientnegative/temp_table_column_stats.q.out 486597a ql/src/test/results/clientpositive/temp_table_display_colstats_tbllvl.q.out PRE-CREATION Diff: https://reviews.apache.org/r/24472/diff/ Testing --- Thanks, Jason Dere
[jira] [Commented] (HIVE-7649) Support column stats with temporary tables
[ https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114955#comment-14114955 ] Prasanth J commented on HIVE-7649: -- LGTM, +1 Support column stats with temporary tables -- Key: HIVE-7649 URL: https://issues.apache.org/jira/browse/HIVE-7649 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch, HIVE-7649.3.patch, HIVE-7649.4.patch Column stats currently not supported with temp tables, see if they can be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 15449: session/operation timeout for hiveserver2
On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote: Navis Ryu wrote: Addressing previous comments, I've revised validator to describe itself to description. For StringSet validator, the description of the conf will be started with something like, Expects one of [textfile, sequencefile, rcfile, orc]. and for TimeValidator, it's Expects a numeric value with timeunit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), etc. It's the reason why some part of description is removed. Could you generate the template and see the result? (cd commmon;mvn clean package -Phadoop-2 -Pdist -DskipTests). If you don't like this, I'll revert that. Lefty Leverenz wrote: Navis, that is cool to the nth degree! I applied patch 15, generated a template file, and checked each parameter changed by the patch. All the Expects phrases look great. However, non-numeric values are lowercase. For example, hive.exec.orc.encoding.strategy used to say the values are SPEED and COMPRESSION, but now it's Expects one of [speed, compression]. Are all parameter values case-insensitive? If so, the Configuration Properties Configuration docs should mention it. Two parameters still give units in their descriptions, although that seems to be deliberate: - hive.server2.long.polling.timeout: Time in milliseconds that HiveServer2 will wait, ... (has a non-zero default value, in milliseconds) - hive.support.quoted.identifiers: Whether to use quoted identifier. 'none' or 'column' can be used. (goes on to explain what 'none' and 'column' mean) bq. non-numeric values are lowercase All values in StringSet are case-insensitive but if you prefer uppercase strings in description, that will be done. bq. Two parameters still give units in their descriptions I was thinking of following issue to applying time validators to others, but I'll do that in here. - Navis --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/#review51760 --- On Aug. 28, 2014, 2:31 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Aug. 28, 2014, 2:31 a.m.) Review request for hive. Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs - common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 2867301 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 84e1c7e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 4e5f595 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 39d2184 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 17c1c7b service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 Diff: https://reviews.apache.org/r/15449/diff/ Testing --- Confirmed in the local environment. Thanks, Navis Ryu
[jira] [Commented] (HIVE-7904) Missing null check cause NPE when updating join column stats in statistics annotation
[ https://issues.apache.org/jira/browse/HIVE-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114979#comment-14114979 ] Hive QA commented on HIVE-7904: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665131/HIVE-7904.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6127 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/553/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/553/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-553/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665131 Missing null check cause NPE when updating join column stats in statistics annotation - Key: HIVE-7904 URL: https://issues.apache.org/jira/browse/HIVE-7904 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Trivial Fix For: 0.13.0 Attachments: HIVE-7904.1.patch Column stats updation in join stats rule annotation can cause NPE if column stats is missing from one relation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7870) Insert overwrite table query does not generate correct task plan [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114985#comment-14114985 ] Hive QA commented on HIVE-7870: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665274/HIVE-7870.3-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6306 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/103/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/103/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-103/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665274 Insert overwrite table query does not generate correct task plan [Spark Branch] --- Key: HIVE-7870 URL: https://issues.apache.org/jira/browse/HIVE-7870 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Na Yang Assignee: Na Yang Labels: Spark-M1 Attachments: HIVE-7870.1-spark.patch, HIVE-7870.2-spark.patch, HIVE-7870.3-spark.patch Insert overwrite table query does not generate correct task plan when hive.optimize.union.remove and hive.merge.sparkfiles properties are ON. {noformat} set hive.optimize.union.remove=true set hive.merge.sparkfiles=true insert overwrite table outputTbl1 SELECT * FROM ( select key, 1 as values from inputTbl1 union all select * FROM ( SELECT key, count(1) as values from inputTbl1 group by key UNION ALL SELECT key, 2 as values from inputTbl1 ) a )b; select * from outputTbl1 order by key, values; {noformat} query result {noformat} 1 1 1 2 2 1 2 2 3 1 3 2 7 1 7 2 8 2 8 2 8 2 {noformat} expected result: {noformat} 1 1 1 1 1 2 2 1 2 1 2 2 3 1 3 1 3 2 7 1 7 1 7 2 8 1 8 1 8 2 8 2 8 2 {noformat} Move work is not working properly and some data are missing during move. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 15449: session/operation timeout for hiveserver2
On Aug. 28, 2014, 7:56 a.m., Lefty Leverenz wrote: Navis Ryu wrote: Addressing previous comments, I've revised validator to describe itself to description. For StringSet validator, the description of the conf will be started with something like, Expects one of [textfile, sequencefile, rcfile, orc]. and for TimeValidator, it's Expects a numeric value with timeunit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), etc. It's the reason why some part of description is removed. Could you generate the template and see the result? (cd commmon;mvn clean package -Phadoop-2 -Pdist -DskipTests). If you don't like this, I'll revert that. Lefty Leverenz wrote: Navis, that is cool to the nth degree! I applied patch 15, generated a template file, and checked each parameter changed by the patch. All the Expects phrases look great. However, non-numeric values are lowercase. For example, hive.exec.orc.encoding.strategy used to say the values are SPEED and COMPRESSION, but now it's Expects one of [speed, compression]. Are all parameter values case-insensitive? If so, the Configuration Properties Configuration docs should mention it. Two parameters still give units in their descriptions, although that seems to be deliberate: - hive.server2.long.polling.timeout: Time in milliseconds that HiveServer2 will wait, ... (has a non-zero default value, in milliseconds) - hive.support.quoted.identifiers: Whether to use quoted identifier. 'none' or 'column' can be used. (goes on to explain what 'none' and 'column' mean) Navis Ryu wrote: bq. non-numeric values are lowercase All values in StringSet are case-insensitive but if you prefer uppercase strings in description, that will be done. bq. Two parameters still give units in their descriptions I was thinking of following issue to applying time validators to others, but I'll do that in here. bq. ... if you prefer uppercase strings in description, that will be done. No, lowercase is better. (If values appeared in uppercase, people would assume it's required.) - Lefty --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/#review51760 --- On Aug. 28, 2014, 2:31 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Aug. 28, 2014, 2:31 a.m.) Review request for hive. Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs - common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 2867301 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 84e1c7e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 4e5f595 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 39d2184 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 17c1c7b service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 Diff: https://reviews.apache.org/r/15449/diff/ Testing --- Confirmed in the local environment. Thanks, Navis Ryu
[jira] [Updated] (HIVE-7904) Missing null check cause NPE when updating join column stats in statistics annotation
[ https://issues.apache.org/jira/browse/HIVE-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7904: - Resolution: Fixed Fix Version/s: (was: 0.13.0) 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk Missing null check cause NPE when updating join column stats in statistics annotation - Key: HIVE-7904 URL: https://issues.apache.org/jira/browse/HIVE-7904 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Priority: Trivial Fix For: 0.14.0 Attachments: HIVE-7904.1.patch Column stats updation in join stats rule annotation can cause NPE if column stats is missing from one relation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7627) FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7627: Attachment: HIVE-7627.1-spark.patch Use {{mapPartitionToPairWithContext()}} instead of {{mapPartitionToPair()}} to get access of TaskContext in HiveMapFuction/HiveReduceFunction. *NOTICE*, this patch depends on SPARK-2895, we need to update Spark dependency to latest spark master build after SPARK-2895 is merged. FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch] - Key: HIVE-7627 URL: https://issues.apache.org/jira/browse/HIVE-7627 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7627.1-spark.patch Hive table statistic failed on FSStatsPublisher mode, with the following exception in Spark executor side: {noformat} 14/08/05 16:46:24 WARN hdfs.DFSClient: DataStreamer Exception java.io.FileNotFoundException: ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1442) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): ID mismatch. Request id and saved id: 20277 , 20278 for file /tmp/hive-root/8833d172-1edd-4508-86db-fdd7a1b0af17/hive_2014-08-05_16-46-03_013_6279446857294757772-1/-ext-1/tmpstats-0 at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:53) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2952) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2754) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2662) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at
[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115002#comment-14115002 ] Chengxiang Li commented on HIVE-7775: - Oh, i got it. enable sample8.q.[Spark Branch] --- Key: HIVE-7775 URL: https://issues.apache.org/jira/browse/HIVE-7775 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, HIVE-7775.3-spark.additional.patch sample8.q contain join query, should enable this qtest after hive on spark support join operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]
Chengxiang Li created HIVE-7909: --- Summary: Fix samaple8.q automatic test failure[Spark Branch] Key: HIVE-7909 URL: https://issues.apache.org/jira/browse/HIVE-7909 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7843) orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115003#comment-14115003 ] Venki Korukanti commented on HIVE-7843: --- Linking SPARK-2895 which is adding support for accessing TaskContext within a function. orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch] Key: HIVE-7843 URL: https://issues.apache.org/jira/browse/HIVE-7843 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Labels: Spark-M1 Fix For: spark-branch {code} java.lang.AssertionError: data length is different from num of DP columns org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809) org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730) org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829) org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502) org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525) org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47) org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27) org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) scala.collection.Iterator$class.foreach(Iterator.scala:727) scala.collection.AbstractIterator.foreach(Iterator.scala:1157) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7909: Attachment: HIVE-7909.1-spark.patch some stats change in the explain part, it's consistent with sample8.q,output in MR mode now. Fix samaple8.q automatic test failure[Spark Branch] --- Key: HIVE-7909 URL: https://issues.apache.org/jira/browse/HIVE-7909 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7909.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7909: Status: Patch Available (was: Open) Fix samaple8.q automatic test failure[Spark Branch] --- Key: HIVE-7909 URL: https://issues.apache.org/jira/browse/HIVE-7909 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7909.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7775) enable sample8.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7775: Status: Open (was: Patch Available) enable sample8.q.[Spark Branch] --- Key: HIVE-7775 URL: https://issues.apache.org/jira/browse/HIVE-7775 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, HIVE-7775.3-spark.additional.patch sample8.q contain join query, should enable this qtest after hive on spark support join operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7775) enable sample8.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li resolved HIVE-7775. - Resolution: Fixed enable sample8.q.[Spark Branch] --- Key: HIVE-7775 URL: https://issues.apache.org/jira/browse/HIVE-7775 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, HIVE-7775.3-spark.additional.patch sample8.q contain join query, should enable this qtest after hive on spark support join operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115006#comment-14115006 ] Chengxiang Li commented on HIVE-7775: - Hi, Szehon, I've created HIVE-7909 to track it. enable sample8.q.[Spark Branch] --- Key: HIVE-7775 URL: https://issues.apache.org/jira/browse/HIVE-7775 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, HIVE-7775.3-spark.additional.patch sample8.q contain join query, should enable this qtest after hive on spark support join operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115030#comment-14115030 ] Hive QA commented on HIVE-7909: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665292/HIVE-7909.1-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6266 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/104/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/104/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-104/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665292 Fix samaple8.q automatic test failure[Spark Branch] --- Key: HIVE-7909 URL: https://issues.apache.org/jira/browse/HIVE-7909 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7909.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7557) When reduce is vectorized, dynpart_sort_opt_vectorization.q under Tez fails
[ https://issues.apache.org/jira/browse/HIVE-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7557: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Matt McCline. When reduce is vectorized, dynpart_sort_opt_vectorization.q under Tez fails --- Key: HIVE-7557 URL: https://issues.apache.org/jira/browse/HIVE-7557 Project: Hive Issue Type: Bug Reporter: Matt McCline Assignee: Matt McCline Fix For: 0.14.0 Attachments: HIVE-7557.1.patch Turned off dynpart_sort_opt_vectorization.q (Tez) since it fails when reduce is vectorized to get HIVE-7029 checked in. Stack trace: {code} Container released by application, AttemptID:attempt_1406747677386_0003_2_00_00_2 Info:Error: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:168) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:159) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processVectors(ReduceRecordProcessor.java:481) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:371) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551) ] at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing vector batch (tag=0) [Error getting row data with exception java.lang.ClassCastException: org.apache.hadoop.hive.ql.exec.vector.DoubleColumnVector cannot be cast to org.apache.hadoop.hive.ql.exec.vector.LongColumnVector at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:168) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:159) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processVectors(ReduceRecordProcessor.java:481) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:371) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:165) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307) at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:394) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551) ] at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.processRows(ReduceRecordProcessor.java:382) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291) at
[jira] [Commented] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)
[ https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115037#comment-14115037 ] Hive QA commented on HIVE-7803: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665172/HIVE-7803.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6127 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/554/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/554/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-554/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665172 Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition) -- Key: HIVE-7803 URL: https://issues.apache.org/jira/browse/HIVE-7803 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Environment: Reporter: Selina Zhang Assignee: Selina Zhang Priority: Critical Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch One of our users reports they see intermittent failures due to attempt directories in the input paths. We found with speculative execution turned on, two mappers tried to commit task at the same time using the same committed task path, which cause the corrupt output directory. The original Pig script: {code} STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME' USING org.apache.hcatalog.pig.HCatStorer(); {code} Two mappers attempt_1405021984947_5394024_m_000523_0: KILLED attempt_1405021984947_5394024_m_000523_1: SUCCEEDED attempt_1405021984947_5394024_m_000523_0 was killed right after the commit. As a result, it created corrupt directory as /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/ containing part-m-00523 (from attempt_1405021984947_5394024_m_000523_0) and attempt_1405021984947_5394024_m_000523_1/part-m-00523 Namenode Audit log == 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 cmd=create src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523 dst=null perm=user:group:rw-r- 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2 cmd=create src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523 dst=null perm=user:group:rw-r- 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 cmd=rename src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523 perm=user:group:rwxr-x--- 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2 cmd=rename src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523 perm=user:group:rwxr-x--- After consulting our Hadoop core team, we was pointed out some HCat code does not participating in the two-phase commit protocol, for example in FileRecordWriterContainer.close(): {code} for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter entry : baseDynamicCommitters.entrySet()) { org.apache.hadoop.mapred.TaskAttemptContext currContext = dynamicContexts.get(entry.getKey()); OutputCommitter baseOutputCommitter = entry.getValue(); if (baseOutputCommitter.needsTaskCommit(currContext)) {
[jira] [Commented] (HIVE-7906) Missing Index on Hive metastore query
[ https://issues.apache.org/jira/browse/HIVE-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115040#comment-14115040 ] Hive QA commented on HIVE-7906: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665188/HIVE-456.patch.txt Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/556/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/556/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-556/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-556/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1621263. At revision 1621263. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch patch: Only garbage was found in the patch input. patch: Only garbage was found in the patch input. patch: Only garbage was found in the patch input. The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12665188 Missing Index on Hive metastore query - Key: HIVE-7906 URL: https://issues.apache.org/jira/browse/HIVE-7906 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Chu Tong Attachments: HIVE-456.patch.txt When it comes to SELECT statement on a table with large number of partitions on Windows Azure DB, the query in the word document below causes major performance degradation. Adding this missing index to turn index scan into seek. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7908) CBO: Handle Windowing functions part of expressions
[ https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115041#comment-14115041 ] Hive QA commented on HIVE-7908: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665264/HIVE-7908.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/557/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/557/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-557/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-557/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1621263. At revision 1621263. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12665264 CBO: Handle Windowing functions part of expressions --- Key: HIVE-7908 URL: https://issues.apache.org/jira/browse/HIVE-7908 Project: Hive Issue Type: Bug Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-7908.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5799: Attachment: HIVE-5799.16.patch.txt session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.13.patch.txt, HIVE-5799.14.patch.txt, HIVE-5799.15.patch.txt, HIVE-5799.16.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 15449: session/operation timeout for hiveserver2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Aug. 29, 2014, 9:05 a.m.) Review request for hive. Changes --- Addressed comments Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs (updated) - common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRetryingHMSHandler.java 39e7005 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9e3481a metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4e76236 metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 84e6dcd metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 063dee6 metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 8287c60 ql/src/java/org/apache/hadoop/hive/ql/exec/AutoProgressor.java d7323cb ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java 7fdb4e7 ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 5b857e2 ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java afd7bcf ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 70047a2 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java eb2851b ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ebe9f92 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 11434a0 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java f636cff ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java db62721 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 3211759 ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestInitiator.java f34b5ad ql/src/test/results/clientnegative/set_hiveconf_validation2.q.out 33f9360 service/src/java/org/apache/hadoop/hive/service/HiveServer.java 32729f2 service/src/java/org/apache/hive/service/cli/CLIService.java ff5de4a service/src/java/org/apache/hive/service/cli/OperationState.java 3e15f0c service/src/java/org/apache/hive/service/cli/operation/Operation.java 0d6436e service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 2867301 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 270e4a6 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 84e1c7e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 4e5f595 service/src/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java 7668904 service/src/java/org/apache/hive/service/cli/session/SessionManager.java 17c1c7b service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 86ed4b4 service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 21d1563 service/src/test/org/apache/hive/service/cli/CLIServiceTest.java d01e819 Diff: https://reviews.apache.org/r/15449/diff/ Testing --- Confirmed in the local environment. Thanks, Navis Ryu
[jira] [Commented] (HIVE-7811) Compactions need to update table/partition stats
[ https://issues.apache.org/jira/browse/HIVE-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115039#comment-14115039 ] Hive QA commented on HIVE-7811: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665185/HIVE-7811.3.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/555/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/555/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-555/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-555/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DynamicPartitionFileRecordWriterContainer.java' Reverted 'hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Uql/src/test/results/clientpositive/tez/dynpart_sort_opt_vectorization.q.out Uql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java U ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java Fetching external item into 'hcatalog/src/test/e2e/harness' Updated external to revision 1621263. Updated to revision 1621263. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12665185 Compactions need to update table/partition stats Key: HIVE-7811 URL: https://issues.apache.org/jira/browse/HIVE-7811 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-7811.3.patch Compactions should trigger stats recalculation for columns that which already have sats. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7669: Attachment: HIVE-7669.4.patch.txt parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Assignee: Navis Labels: orderby Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24688: parallel order by clause on a string column fails with IOException: Split points are out of order
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24688/ --- (Updated Aug. 29, 2014, 9:08 a.m.) Review request for hive. Changes --- Removed the conf, as commented Bugs: HIVE-7669 https://issues.apache.org/jira/browse/HIVE-7669 Repository: hive-git Description --- The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 ql/src/java/org/apache/hadoop/hive/ql/exec/HiveTotalOrderPartitioner.java 6c22362 ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java 166461a ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java ef72039 ql/src/test/org/apache/hadoop/hive/ql/exec/TestPartitionKeySampler.java PRE-CREATION Diff: https://reviews.apache.org/r/24688/diff/ Testing --- Thanks, Navis Ryu
[jira] [Resolved] (HIVE-7910) Enhance natural order scheduler to prevent downstream vertex from monopolizing the cluster resources
[ https://issues.apache.org/jira/browse/HIVE-7910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan resolved HIVE-7910. Resolution: Won't Fix Apologizes..Meant for tez project. Closing this bug. Enhance natural order scheduler to prevent downstream vertex from monopolizing the cluster resources Key: HIVE-7910 URL: https://issues.apache.org/jira/browse/HIVE-7910 Project: Hive Issue Type: Bug Reporter: Rajesh Balamohan Labels: performance M2 M7 \ / (sg) \/ R3/ (b) \ / (b) \ / \ / M5 | R6 Plz refer to the attachment (task runtime SVG). In this case, M5 got scheduled much earlier than R3 (R3 is mentioned as green color in the diagram) and retained lots of containers. R3 got less containers to work with. Attaching the output from the status monitor when the job ran; Map_5 has taken up almost all containers, whereas Reducer_3 got fraction of the capacity. Map_2: 1/1 Map_5: 0(+373)/1000 Map_7: 1/1 Reducer_3: 0/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 0/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 0(+1)/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 14(+7)/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 63(+14)/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 159(+22)/8000Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 308(+29)/8000Reducer_6: 0/1 ... Creating this JIRA as a placeholder for scheduler enhancement. One possibililty could be to schedule lesser number of tasks in downstream vertices, based on the information available for the upstream vertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7910) Enhance natural order scheduler to prevent downstream vertex from monopolizing the cluster resources
Rajesh Balamohan created HIVE-7910: -- Summary: Enhance natural order scheduler to prevent downstream vertex from monopolizing the cluster resources Key: HIVE-7910 URL: https://issues.apache.org/jira/browse/HIVE-7910 Project: Hive Issue Type: Bug Reporter: Rajesh Balamohan M2 M7 \ / (sg) \/ R3/ (b) \ / (b) \ / \ / M5 | R6 Plz refer to the attachment (task runtime SVG). In this case, M5 got scheduled much earlier than R3 (R3 is mentioned as green color in the diagram) and retained lots of containers. R3 got less containers to work with. Attaching the output from the status monitor when the job ran; Map_5 has taken up almost all containers, whereas Reducer_3 got fraction of the capacity. Map_2: 1/1 Map_5: 0(+373)/1000 Map_7: 1/1 Reducer_3: 0/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 0/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 0(+1)/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 14(+7)/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 63(+14)/8000 Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 159(+22)/8000Reducer_6: 0/1 Map_2: 1/1 Map_5: 0(+374)/1000 Map_7: 1/1 Reducer_3: 308(+29)/8000Reducer_6: 0/1 ... Creating this JIRA as a placeholder for scheduler enhancement. One possibililty could be to schedule lesser number of tasks in downstream vertices, based on the information available for the upstream vertex. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7649) Support column stats with temporary tables
[ https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115083#comment-14115083 ] Hive QA commented on HIVE-7649: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665265/HIVE-7649.4.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6127 tests executed *Failed tests:* {noformat} org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/558/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/558/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-558/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665265 Support column stats with temporary tables -- Key: HIVE-7649 URL: https://issues.apache.org/jira/browse/HIVE-7649 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch, HIVE-7649.3.patch, HIVE-7649.4.patch Column stats currently not supported with temp tables, see if they can be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115160#comment-14115160 ] Hive QA commented on HIVE-7669: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665301/HIVE-7669.4.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6128 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/559/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/559/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-559/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665301 parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Assignee: Navis Labels: orderby Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at
[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115227#comment-14115227 ] Hive QA commented on HIVE-5799: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665300/HIVE-5799.16.patch.txt {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 6128 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_conf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt10 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_set_hiveconf_validation2 org.apache.hadoop.hive.ql.txn.compactor.TestInitiator.recoverFailedRemoteWorkers org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/560/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/560/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-560/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665300 session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, HIVE-5799.11.patch.txt, HIVE-5799.12.patch.txt, HIVE-5799.13.patch.txt, HIVE-5799.14.patch.txt, HIVE-5799.15.patch.txt, HIVE-5799.16.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 15449: session/operation timeout for hiveserver2
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/#review51882 --- Ship it! Only minor comments mostly on lines exceeding Checkstyle's configuration. common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90540 long line common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90541 long line common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90542 long line common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90543 long lines common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90544 long line common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90545 long line common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90546 long line common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90547 long line common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90548 long line common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90549 long lines common/src/java/org/apache/hadoop/hive/conf/HiveConf.java https://reviews.apache.org/r/15449/#comment90550 long line common/src/java/org/apache/hadoop/hive/conf/Validator.java https://reviews.apache.org/r/15449/#comment90551 missing @Override metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java https://reviews.apache.org/r/15449/#comment90552 long line metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java https://reviews.apache.org/r/15449/#comment90553 long line ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java https://reviews.apache.org/r/15449/#comment90554 This comment needs updating ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java https://reviews.apache.org/r/15449/#comment90555 extra space before semicolon service/src/java/org/apache/hive/service/cli/operation/Operation.java https://reviews.apache.org/r/15449/#comment90556 + - can be changed to just -, maybe warrants a comment. - Lars Francke On Aug. 29, 2014, 9:05 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15449/ --- (Updated Aug. 29, 2014, 9:05 a.m.) Review request for hive. Bugs: HIVE-5799 https://issues.apache.org/jira/browse/HIVE-5799 Repository: hive-git Description --- Need some timeout facility for preventing resource leakages from instable or bad clients. Diffs - common/src/java/org/apache/hadoop/hive/ant/GenHiveTemplate.java 4293b7c common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 74bb863 common/src/java/org/apache/hadoop/hive/conf/Validator.java cea9c41 itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRetryingHMSHandler.java 39e7005 itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2SessionTimeout.java PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 9e3481a metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 4e76236 metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 84e6dcd metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 063dee6 metastore/src/test/org/apache/hadoop/hive/metastore/txn/TestTxnHandler.java 8287c60 ql/src/java/org/apache/hadoop/hive/ql/exec/AutoProgressor.java d7323cb ql/src/java/org/apache/hadoop/hive/ql/exec/Heartbeater.java 7fdb4e7 ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java 5b857e2 ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java afd7bcf ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 70047a2 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java eb2851b ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java ebe9f92 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java 11434a0 ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java 46044d0 ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java f636cff ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java db62721
[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7405: --- Status: Patch Available (was: Open) +1 Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator
Lars Francke created HIVE-7911: -- Summary: Guaranteed ClassCastException in AccumuloRangeGenerator Key: HIVE-7911 URL: https://issues.apache.org/jira/browse/HIVE-7911 Project: Hive Issue Type: Bug Reporter: Lars Francke AccumuloRangeGenerator has a typo where it should say {{WritableConstantFloatObjectInspector}} instead of {{WritableConstantDoubleObjectInspector}}. I've changed the method to avoid the multiple if-else statements as all that is expected is a {{PrimitiveObjectInspector}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator
[ https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke reassigned HIVE-7911: -- Assignee: Lars Francke Guaranteed ClassCastException in AccumuloRangeGenerator --- Key: HIVE-7911 URL: https://issues.apache.org/jira/browse/HIVE-7911 Project: Hive Issue Type: Bug Reporter: Lars Francke Assignee: Lars Francke Attachments: HIVE-7911.1.patch AccumuloRangeGenerator has a typo where it should say {{WritableConstantFloatObjectInspector}} instead of {{WritableConstantDoubleObjectInspector}}. I've changed the method to avoid the multiple if-else statements as all that is expected is a {{PrimitiveObjectInspector}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator
[ https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke updated HIVE-7911: --- Attachment: HIVE-7911.1.patch Guaranteed ClassCastException in AccumuloRangeGenerator --- Key: HIVE-7911 URL: https://issues.apache.org/jira/browse/HIVE-7911 Project: Hive Issue Type: Bug Reporter: Lars Francke Attachments: HIVE-7911.1.patch AccumuloRangeGenerator has a typo where it should say {{WritableConstantFloatObjectInspector}} instead of {{WritableConstantDoubleObjectInspector}}. I've changed the method to avoid the multiple if-else statements as all that is expected is a {{PrimitiveObjectInspector}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator
[ https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Francke updated HIVE-7911: --- Status: Patch Available (was: Open) Guaranteed ClassCastException in AccumuloRangeGenerator --- Key: HIVE-7911 URL: https://issues.apache.org/jira/browse/HIVE-7911 Project: Hive Issue Type: Bug Reporter: Lars Francke Attachments: HIVE-7911.1.patch AccumuloRangeGenerator has a typo where it should say {{WritableConstantFloatObjectInspector}} instead of {{WritableConstantDoubleObjectInspector}}. I've changed the method to avoid the multiple if-else statements as all that is expected is a {{PrimitiveObjectInspector}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7551) expand spark accumulator to support hive counter [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115369#comment-14115369 ] Suhas Satish commented on HIVE-7551: Assigning to myself after talking to Na. Is this for milestone Spark-M3 as the dependent jiras are labeled? expand spark accumulator to support hive counter [Spark Branch] Key: HIVE-7551 URL: https://issues.apache.org/jira/browse/HIVE-7551 Project: Hive Issue Type: New Feature Components: Spark Reporter: Chengxiang Li Assignee: Na Yang hive collect some operator statistic information through counter, we need to support MR/Tez counter counterpart through spark accumulator. NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7551) expand spark accumulator to support hive counter [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Satish reassigned HIVE-7551: -- Assignee: Suhas Satish (was: Na Yang) expand spark accumulator to support hive counter [Spark Branch] Key: HIVE-7551 URL: https://issues.apache.org/jira/browse/HIVE-7551 Project: Hive Issue Type: New Feature Components: Spark Reporter: Chengxiang Li Assignee: Suhas Satish hive collect some operator statistic information through counter, we need to support MR/Tez counter counterpart through spark accumulator. NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115380#comment-14115380 ] Suhas Satish commented on HIVE-7775: what kind of join did Szehon enable? Does hive on spark support full outer join? enable sample8.q.[Spark Branch] --- Key: HIVE-7775 URL: https://issues.apache.org/jira/browse/HIVE-7775 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, HIVE-7775.3-spark.additional.patch sample8.q contain join query, should enable this qtest after hive on spark support join operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7775) enable sample8.q.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115389#comment-14115389 ] Brock Noland commented on HIVE-7775: Hi Suhas, The JIRA is here: HIVE-7815 basically it's non-parallel reduce side join which supports full, left, right, and inner joins. Cheers! enable sample8.q.[Spark Branch] --- Key: HIVE-7775 URL: https://issues.apache.org/jira/browse/HIVE-7775 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7775.1-spark.patch, HIVE-7775.2-spark.patch, HIVE-7775.3-spark.additional.patch sample8.q contain join query, should enable this qtest after hive on spark support join operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
[ https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115387#comment-14115387 ] Hive QA commented on HIVE-7405: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665164/HIVE-7405.93.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6127 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/561/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/561/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-561/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665164 Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic) -- Key: HIVE-7405 URL: https://issues.apache.org/jira/browse/HIVE-7405 Project: Hive Issue Type: Sub-task Reporter: Matt McCline Assignee: Matt McCline Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, HIVE-7405.93.patch Vectorize the basic case that does not have any count distinct aggregation. Add a 4th processing mode in VectorGroupByOperator for reduce where each input VectorizedRowBatch has only values for one key at a time. Thus, the values in the batch can be aggregated quickly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7811) Compactions need to update table/partition stats
[ https://issues.apache.org/jira/browse/HIVE-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7811: - Attachment: HIVE-7811.4.patch Compactions need to update table/partition stats Key: HIVE-7811 URL: https://issues.apache.org/jira/browse/HIVE-7811 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-7811.3.patch, HIVE-7811.4.patch Compactions should trigger stats recalculation for columns that which already have sats. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7884) When partition filter containing single column from multiple partition is used in HCatInputFormat.setFilter, it returns empty set
[ https://issues.apache.org/jira/browse/HIVE-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prafulla T resolved HIVE-7884. -- Resolution: Invalid We found that this was due to error in our program that fetches from hive. Resolving as Invalid. When partition filter containing single column from multiple partition is used in HCatInputFormat.setFilter, it returns empty set -- Key: HIVE-7884 URL: https://issues.apache.org/jira/browse/HIVE-7884 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Prafulla T In one of product of my company, we use HCatInputFormat to import data from hadoop-hive to our database. We use HCatInputFormat.setFilter to pass partition filters based on partition columns. We see following issue in latest hive. When hive table has multiple partition columns and partition filter uses only single column out of it, we get empty set instead of returning rows from partitions which match with single column used in partition filter. This used to work earlier (hive 0.10.0 or 0.11.0), We experience this problem in hive 0.13.0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator
[ https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115469#comment-14115469 ] Ashutosh Chauhan commented on HIVE-7911: +1 Guaranteed ClassCastException in AccumuloRangeGenerator --- Key: HIVE-7911 URL: https://issues.apache.org/jira/browse/HIVE-7911 Project: Hive Issue Type: Bug Reporter: Lars Francke Assignee: Lars Francke Attachments: HIVE-7911.1.patch AccumuloRangeGenerator has a typo where it should say {{WritableConstantFloatObjectInspector}} instead of {{WritableConstantDoubleObjectInspector}}. I've changed the method to avoid the multiple if-else statements as all that is expected is a {{PrimitiveObjectInspector}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7902) Cleanup hbase-handler/pom.xml dependency list
[ https://issues.apache.org/jira/browse/HIVE-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7902: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thank you very much for cleaning up my mess! :) I have committed this to trunk. Cleanup hbase-handler/pom.xml dependency list - Key: HIVE-7902 URL: https://issues.apache.org/jira/browse/HIVE-7902 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.13.0, 0.13.1 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7902.1.patch Noticed an extra dependency {{hive-service}} when changing dependency version of {{hive-hbase-handler}} from 0.12.0 to 0.13.0 in a third party application. Tracing the log of hbase-handler/pom.xml file, it is added as part of ant to maven migration and not because of any specific functionality requirement. Dependency {{hive-service}} is not needed in {{hive-hbase-handler}} and can be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7649) Support column stats with temporary tables
[ https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7649: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~jdere]! Support column stats with temporary tables -- Key: HIVE-7649 URL: https://issues.apache.org/jira/browse/HIVE-7649 Project: Hive Issue Type: Bug Components: Statistics Reporter: Jason Dere Assignee: Jason Dere Fix For: 0.14.0 Attachments: HIVE-7649.1.patch, HIVE-7649.2.patch, HIVE-7649.3.patch, HIVE-7649.4.patch Column stats currently not supported with temp tables, see if they can be added. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7911) Guaranteed ClassCastException in AccumuloRangeGenerator
[ https://issues.apache.org/jira/browse/HIVE-7911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115531#comment-14115531 ] Hive QA commented on HIVE-7911: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665356/HIVE-7911.1.patch {color:green}SUCCESS:{color} +1 6127 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/562/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/562/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-562/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12665356 Guaranteed ClassCastException in AccumuloRangeGenerator --- Key: HIVE-7911 URL: https://issues.apache.org/jira/browse/HIVE-7911 Project: Hive Issue Type: Bug Reporter: Lars Francke Assignee: Lars Francke Attachments: HIVE-7911.1.patch AccumuloRangeGenerator has a typo where it should say {{WritableConstantFloatObjectInspector}} instead of {{WritableConstantDoubleObjectInspector}}. I've changed the method to avoid the multiple if-else statements as all that is expected is a {{PrimitiveObjectInspector}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/#review51889 --- Hi Na, Thank you very much for the patch! I have one high level question: It appears we created the union_remove_spark* files because we wanted to add an additional property to the union_remove .q file? Meaning what is the delta beween union_remove_spark_1.q and union_remove_? Cheers! - Brock Noland On Aug. 29, 2014, 6:44 a.m., Na Yang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/ --- (Updated Aug. 29, 2014, 6:44 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-7870 https://issues.apache.org/jira/browse/HIVE-7870 Repository: hive-git Description --- HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch] The cause of this problem is during spark/tez task generation, the union file sink operator are cloned to two new filesink operator. The linkedfilesinkdesc info for those new filesink operators are missing. In addition, the two new filesink operators also need to be linked together. Diffs - itests/src/test/resources/testconfiguration.properties 6393671 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b ql/src/test/results/clientpositive/spark/union10.q.out 20c681e ql/src/test/results/clientpositive/spark/union18.q.out 3f37a0a ql/src/test/results/clientpositive/spark/union19.q.out 6922fcd ql/src/test/results/clientpositive/spark/union28.q.out 8bd5218 ql/src/test/results/clientpositive/spark/union29.q.out b9546ef ql/src/test/results/clientpositive/spark/union3.q.out 3ae6536 ql/src/test/results/clientpositive/spark/union30.q.out 12717a1 ql/src/test/results/clientpositive/spark/union33.q.out b89757f ql/src/test/results/clientpositive/spark/union4.q.out 6341cd9 ql/src/test/results/clientpositive/spark/union6.q.out 263d9f4 ql/src/test/results/clientpositive/spark/union_remove_spark_1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_10.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_11.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_15.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_16.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_17.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_18.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_19.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_2.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_20.q.out PRE-CREATION
[jira] [Commented] (HIVE-7803) Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition)
[ https://issues.apache.org/jira/browse/HIVE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115538#comment-14115538 ] Selina Zhang commented on HIVE-7803: The test failures seem not related to this patch. Saw the same failures for HIVE-7890. Enable Hadoop speculative execution may cause corrupt output directory (dynamic partition) -- Key: HIVE-7803 URL: https://issues.apache.org/jira/browse/HIVE-7803 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.13.1 Environment: Reporter: Selina Zhang Assignee: Selina Zhang Priority: Critical Attachments: HIVE-7803.1.patch, HIVE-7803.2.patch One of our users reports they see intermittent failures due to attempt directories in the input paths. We found with speculative execution turned on, two mappers tried to commit task at the same time using the same committed task path, which cause the corrupt output directory. The original Pig script: {code} STORE AdvertiserDataParsedClean INTO '$DB_NAME.$ADVERTISER_META_TABLE_NAME' USING org.apache.hcatalog.pig.HCatStorer(); {code} Two mappers attempt_1405021984947_5394024_m_000523_0: KILLED attempt_1405021984947_5394024_m_000523_1: SUCCEEDED attempt_1405021984947_5394024_m_000523_0 was killed right after the commit. As a result, it created corrupt directory as /projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523/ containing part-m-00523 (from attempt_1405021984947_5394024_m_000523_0) and attempt_1405021984947_5394024_m_000523_1/part-m-00523 Namenode Audit log == 1. 2014-08-05 05:04:36,811 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 cmd=create src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0/part-m-00523 dst=null perm=user:group:rw-r- 2. 2014-08-05 05:04:53,112 INFO FSNamesystem.audit: ugi=* ip=ipaddress2 cmd=create src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1/part-m-00523 dst=null perm=user:group:rw-r- 3. 2014-08-05 05:05:13,001 INFO FSNamesystem.audit: ugi=* ip=ipaddress1 cmd=rename src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_0 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523 perm=user:group:rwxr-x--- 4. 2014-08-05 05:05:13,004 INFO FSNamesystem.audit: ugi=* ip=ipaddress2 cmd=rename src=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/_temporary/attempt_1405021984947_5394024_m_000523_1 dst=/projects/.../tablename/_DYN0.7192688458252056/load_time=20140805/type=complete/_temporary/1/task_1405021984947_5394024_m_000523 perm=user:group:rwxr-x--- After consulting our Hadoop core team, we was pointed out some HCat code does not participating in the two-phase commit protocol, for example in FileRecordWriterContainer.close(): {code} for (Map.EntryString, org.apache.hadoop.mapred.OutputCommitter entry : baseDynamicCommitters.entrySet()) { org.apache.hadoop.mapred.TaskAttemptContext currContext = dynamicContexts.get(entry.getKey()); OutputCommitter baseOutputCommitter = entry.getValue(); if (baseOutputCommitter.needsTaskCommit(currContext)) { baseOutputCommitter.commitTask(currContext); } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HIVE-7100: --- Status: Open (was: Patch Available) Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, HIVE-7100.4.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.
[ https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HIVE-7100: --- Status: Patch Available (was: Open) Users of hive should be able to specify skipTrash when dropping tables. --- Key: HIVE-7100 URL: https://issues.apache.org/jira/browse/HIVE-7100 Project: Hive Issue Type: Improvement Affects Versions: 0.13.0 Reporter: Ravi Prakash Assignee: Jayesh Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, HIVE-7100.4.patch, HIVE-7100.patch Users of our clusters are often running up against their quota limits because of Hive tables. When they drop tables, they have to then manually delete the files from HDFS using skipTrash. This is cumbersome and unnecessary. We should enable users to skipTrash directly when dropping tables. We should also be able to provide this functionality without polluting SQL syntax. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7901) CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version)
[ https://issues.apache.org/jira/browse/HIVE-7901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-7901: -- Attachment: hive-7901.01.patch I modified the original HIVE-6633 patch to put the changes in the right place, under apache/hive. This is a new patch for those changes based directly off the current hive trunk. CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version) Key: HIVE-7901 URL: https://issues.apache.org/jira/browse/HIVE-7901 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Eric Hanson Attachments: hive-7901.01.patch This fails because the embedded metastore can't connect to the database because the command line -D arguments passed to pig are not getting passed to the metastore when the embedded metastore is created. Using hive.metastore.uris set to the empty string causes creation of an embedded metastore. pig -useHCatalog -Dhive.metastore.uris= -Djavax.jdo.option.ConnectionPassword=AzureSQLDBXYZ The goal is to allow a pig job submitted via WebHCat to specify a metastore to use via job arguments. That is not working because it is not possible to pass Djavax.jdo.option.ConnectionPassword and other necessary arguments to the embedded metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7901) CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version)
[ https://issues.apache.org/jira/browse/HIVE-7901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-7901: -- Status: Patch Available (was: Open) CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version) Key: HIVE-7901 URL: https://issues.apache.org/jira/browse/HIVE-7901 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Eric Hanson Attachments: hive-7901.01.patch This fails because the embedded metastore can't connect to the database because the command line -D arguments passed to pig are not getting passed to the metastore when the embedded metastore is created. Using hive.metastore.uris set to the empty string causes creation of an embedded metastore. pig -useHCatalog -Dhive.metastore.uris= -Djavax.jdo.option.ConnectionPassword=AzureSQLDBXYZ The goal is to allow a pig job submitted via WebHCat to specify a metastore to use via job arguments. That is not working because it is not possible to pass Djavax.jdo.option.ConnectionPassword and other necessary arguments to the embedded metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7901) CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version)
[ https://issues.apache.org/jira/browse/HIVE-7901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115550#comment-14115550 ] Eric Hanson commented on HIVE-7901: --- [~sushanth], please have a look and +1/commit if you think it's ready. Thanks! CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version) Key: HIVE-7901 URL: https://issues.apache.org/jira/browse/HIVE-7901 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Eric Hanson Attachments: hive-7901.01.patch This fails because the embedded metastore can't connect to the database because the command line -D arguments passed to pig are not getting passed to the metastore when the embedded metastore is created. Using hive.metastore.uris set to the empty string causes creation of an embedded metastore. pig -useHCatalog -Dhive.metastore.uris= -Djavax.jdo.option.ConnectionPassword=AzureSQLDBXYZ The goal is to allow a pig job submitted via WebHCat to specify a metastore to use via job arguments. That is not working because it is not possible to pass Djavax.jdo.option.ConnectionPassword and other necessary arguments to the embedded metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115549#comment-14115549 ] Szehon Ho commented on HIVE-7909: - +1 Fix samaple8.q automatic test failure[Spark Branch] --- Key: HIVE-7909 URL: https://issues.apache.org/jira/browse/HIVE-7909 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7909.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115559#comment-14115559 ] Szehon Ho commented on HIVE-7669: - Can you please add the license header, though? parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Assignee: Navis Labels: orderby Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7669) parallel order by clause on a string column fails with IOException: Split points are out of order
[ https://issues.apache.org/jira/browse/HIVE-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1411#comment-1411 ] Szehon Ho commented on HIVE-7669: - +1 parallel order by clause on a string column fails with IOException: Split points are out of order - Key: HIVE-7669 URL: https://issues.apache.org/jira/browse/HIVE-7669 Project: Hive Issue Type: Bug Components: HiveServer2, Query Processor, SQL Affects Versions: 0.12.0 Environment: Hive 0.12.0-cdh5.0.0 OS: Redhat linux Reporter: Vishal Kamath Assignee: Navis Labels: orderby Attachments: HIVE-7669.1.patch.txt, HIVE-7669.2.patch.txt, HIVE-7669.3.patch.txt, HIVE-7669.4.patch.txt The source table has 600 Million rows and it has a String column l_shipinstruct which has 4 unique values. (Ie. these 4 values are repeated across the 600 million rows) We are sorting it based on this string column l_shipinstruct as shown in the below HiveQL with the following parameters. {code:sql} set hive.optimize.sampling.orderby=true; set hive.optimize.sampling.orderby.number=1000; set hive.optimize.sampling.orderby.percent=0.1f; insert overwrite table lineitem_temp_report select l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment from lineitem order by l_shipinstruct; {code} Stack Trace Diagnostic Messages for this Task: {noformat} Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.init(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 10 more Caused by: java.lang.IllegalArgumentException: Can't read partitions file at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116) at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:42) at org.apache.hadoop.hive.ql.exec.HiveTotalOrderPartitioner.configure(HiveTotalOrderPartitioner.java:37) ... 15 more Caused by: java.io.IOException: Split points are out of order at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:96) ... 17 more {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7912) Don't add is not null filter for partitioning column
Ashutosh Chauhan created HIVE-7912: -- Summary: Don't add is not null filter for partitioning column Key: HIVE-7912 URL: https://issues.apache.org/jira/browse/HIVE-7912 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan HIVE-7159 introduces optimization which introduces is not null filter on inner join columns which is wasteful for partitioning column. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7912) Don't add is not null filter for partitioning column
[ https://issues.apache.org/jira/browse/HIVE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7912: --- Attachment: HIVE-7912.patch Don't add is not null filter for partitioning column Key: HIVE-7912 URL: https://issues.apache.org/jira/browse/HIVE-7912 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7912.patch HIVE-7159 introduces optimization which introduces is not null filter on inner join columns which is wasteful for partitioning column. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7912) Don't add is not null filter for partitioning column
[ https://issues.apache.org/jira/browse/HIVE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7912: --- Status: Patch Available (was: Open) Don't add is not null filter for partitioning column Key: HIVE-7912 URL: https://issues.apache.org/jira/browse/HIVE-7912 Project: Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7912.patch HIVE-7159 introduces optimization which introduces is not null filter on inner join columns which is wasteful for partitioning column. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 25194: Don't add is not null filter for partitioning column
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25194/ --- Review request for hive and Harish Butani. Bugs: HIVE-7912 https://issues.apache.org/jira/browse/HIVE-7912 Repository: hive-git Description --- Don't add is not null filter for partitioning column Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 0106707 Diff: https://reviews.apache.org/r/25194/diff/ Testing --- Existing tests. Verified in debugger. Thanks, Ashutosh Chauhan
Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]
On Aug. 29, 2014, 5:30 p.m., Brock Noland wrote: Hi Na, Thank you very much for the patch! I have one high level question: It appears we created the union_remove_spark* files because we wanted to add an additional property to the union_remove .q file? Meaning what is the delta beween union_remove_spark_1.q and union_remove_? Cheers! Hi Brock, That is correct. the union_remove_spark* files include an extra config property hive.merge.sparkfile comparing to the corresponding union_remove_* files. Except that extra config property, all other queries in the union_remove_spark* file are same as the queries in the union_remove_* file. The hive.merge.sparkfile value is set according to the hive.merge.mapfile and hive.merge.mapredfile properity values in the orginal union_remove_* file. Regarding to the test result, we expect to see the same data are returned from the union_remove_spark* queries and the corresponding union_remove_* queries. Thanks, Na - Na --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/#review51889 --- On Aug. 29, 2014, 6:44 a.m., Na Yang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/ --- (Updated Aug. 29, 2014, 6:44 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-7870 https://issues.apache.org/jira/browse/HIVE-7870 Repository: hive-git Description --- HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch] The cause of this problem is during spark/tez task generation, the union file sink operator are cloned to two new filesink operator. The linkedfilesinkdesc info for those new filesink operators are missing. In addition, the two new filesink operators also need to be linked together. Diffs - itests/src/test/resources/testconfiguration.properties 6393671 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b ql/src/test/results/clientpositive/spark/union10.q.out 20c681e ql/src/test/results/clientpositive/spark/union18.q.out 3f37a0a ql/src/test/results/clientpositive/spark/union19.q.out 6922fcd ql/src/test/results/clientpositive/spark/union28.q.out 8bd5218 ql/src/test/results/clientpositive/spark/union29.q.out b9546ef ql/src/test/results/clientpositive/spark/union3.q.out 3ae6536 ql/src/test/results/clientpositive/spark/union30.q.out 12717a1 ql/src/test/results/clientpositive/spark/union33.q.out b89757f ql/src/test/results/clientpositive/spark/union4.q.out 6341cd9 ql/src/test/results/clientpositive/spark/union6.q.out 263d9f4 ql/src/test/results/clientpositive/spark/union_remove_spark_1.q.out PRE-CREATION ql/src/test/results/clientpositive/spark/union_remove_spark_10.q.out PRE-CREATION
[jira] [Updated] (HIVE-7909) Fix samaple8.q automatic test failure[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-7909: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to spark branch. Thanks Chengxiang. Fix samaple8.q automatic test failure[Spark Branch] --- Key: HIVE-7909 URL: https://issues.apache.org/jira/browse/HIVE-7909 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Fix For: 0.14.0 Attachments: HIVE-7909.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7913) Simplify predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7913: -- Description: (was: I noticed that the estimate number of rows in Map joins is higher after the join than before the join that is with column stats fetch ON or OFF. TPC-DS Q55 was a good example for that, the issue is that the current statistics provide us enough information that we can estimate with strong confidence that the joins are one to many and not many to many. Joining store_sales x item on ss_item_sk = i_item_sk, we know that the NDV, min and max values for both join columns match while the row counts are different this pattern indicates a PK/FK relationship between store_sales and item. Yet when a filter is applied on item and reduces the number of rows from 462K to 7K we estimate a many to many join between the filtered item and store_sales and as a result the estimate number of rows coming out of the join is off by several orders of magnitude. Available information from the stats {code} Table Join column NDV from describe NDV actual min max itemi_item_sk 439,501 462,000 1 462,000 date_dimd_date_sk 65,332 73,049 2,415,022 2,488,070 store_sales ss_item_sk 439,501 462,000 1 462,000 store_sales ss_sold_date_sk 2,226 1,823 2,450,816 2,452,642 {code} Same thing applies to store_sales and date_dim but with a caveat that the NDV , min and max values don't match where date_dim has a bigger domain and accordingly a higher NDV count. For joining store_sales and item on on ss_item_sk = i_item_sk since both columns have the same NDV, min and max values we can safely conclude that selectivity on item will translate to similar selectivity on store_sales. This is not the case for joining store_sales and date_dim on ss_sold_date_sk = d_date_sk since the domain of d_date_sk is much bigger than that of ss_sold_date_sk, differences in domain need to be taken into account when inferring selectivity onto store_sales.) Simplify predicates for CBO --- Key: HIVE-7913 URL: https://issues.apache.org/jira/browse/HIVE-7913 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: 0.14.0 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7913) Simplify predicates for CBO
Mostafa Mokhtar created HIVE-7913: - Summary: Simplify predicates for CBO Key: HIVE-7913 URL: https://issues.apache.org/jira/browse/HIVE-7913 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Fix For: 0.14.0 I noticed that the estimate number of rows in Map joins is higher after the join than before the join that is with column stats fetch ON or OFF. TPC-DS Q55 was a good example for that, the issue is that the current statistics provide us enough information that we can estimate with strong confidence that the joins are one to many and not many to many. Joining store_sales x item on ss_item_sk = i_item_sk, we know that the NDV, min and max values for both join columns match while the row counts are different this pattern indicates a PK/FK relationship between store_sales and item. Yet when a filter is applied on item and reduces the number of rows from 462K to 7K we estimate a many to many join between the filtered item and store_sales and as a result the estimate number of rows coming out of the join is off by several orders of magnitude. Available information from the stats {code} Table Join column NDV from describe NDV actual min max itemi_item_sk 439,501 462,000 1 462,000 date_dimd_date_sk 65,332 73,049 2,415,022 2,488,070 store_sales ss_item_sk 439,501 462,000 1 462,000 store_sales ss_sold_date_sk 2,226 1,823 2,450,816 2,452,642 {code} Same thing applies to store_sales and date_dim but with a caveat that the NDV , min and max values don't match where date_dim has a bigger domain and accordingly a higher NDV count. For joining store_sales and item on on ss_item_sk = i_item_sk since both columns have the same NDV, min and max values we can safely conclude that selectivity on item will translate to similar selectivity on store_sales. This is not the case for joining store_sales and date_dim on ss_sold_date_sk = d_date_sk since the domain of d_date_sk is much bigger than that of ss_sold_date_sk, differences in domain need to be taken into account when inferring selectivity onto store_sales. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7913) Simplify predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7913: -- Assignee: Laljo John Pullokkaran Simplify predicates for CBO --- Key: HIVE-7913 URL: https://issues.apache.org/jira/browse/HIVE-7913 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: 0.14.0 I noticed that the estimate number of rows in Map joins is higher after the join than before the join that is with column stats fetch ON or OFF. TPC-DS Q55 was a good example for that, the issue is that the current statistics provide us enough information that we can estimate with strong confidence that the joins are one to many and not many to many. Joining store_sales x item on ss_item_sk = i_item_sk, we know that the NDV, min and max values for both join columns match while the row counts are different this pattern indicates a PK/FK relationship between store_sales and item. Yet when a filter is applied on item and reduces the number of rows from 462K to 7K we estimate a many to many join between the filtered item and store_sales and as a result the estimate number of rows coming out of the join is off by several orders of magnitude. Available information from the stats {code} Table Join column NDV from describe NDV actual min max item i_item_sk 439,501 462,000 1 462,000 date_dim d_date_sk 65,332 73,049 2,415,022 2,488,070 store_sales ss_item_sk 439,501 462,000 1 462,000 store_sales ss_sold_date_sk 2,226 1,823 2,450,816 2,452,642 {code} Same thing applies to store_sales and date_dim but with a caveat that the NDV , min and max values don't match where date_dim has a bigger domain and accordingly a higher NDV count. For joining store_sales and item on on ss_item_sk = i_item_sk since both columns have the same NDV, min and max values we can safely conclude that selectivity on item will translate to similar selectivity on store_sales. This is not the case for joining store_sales and date_dim on ss_sold_date_sk = d_date_sk since the domain of d_date_sk is much bigger than that of ss_sold_date_sk, differences in domain need to be taken into account when inferring selectivity onto store_sales. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7913) Simplify predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7913: -- Description: Simplify predicates for disjunctive predicates so that can get pushed down to the scan {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} Simplify predicates for CBO --- Key: HIVE-7913 URL: https://issues.apache.org/jira/browse/HIVE-7913 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: 0.14.0 Simplify predicates for disjunctive predicates so that can get pushed down to the scan {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and
[jira] [Commented] (HIVE-7811) Compactions need to update table/partition stats
[ https://issues.apache.org/jira/browse/HIVE-7811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115638#comment-14115638 ] Hive QA commented on HIVE-7811: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665364/HIVE-7811.4.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6128 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/563/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/563/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-563/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665364 Compactions need to update table/partition stats Key: HIVE-7811 URL: https://issues.apache.org/jira/browse/HIVE-7811 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-7811.3.patch, HIVE-7811.4.patch Compactions should trigger stats recalculation for columns that which already have sats. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7902) Cleanup hbase-handler/pom.xml dependency list
[ https://issues.apache.org/jira/browse/HIVE-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115655#comment-14115655 ] Szehon Ho commented on HIVE-7902: - You committed to spark branch :) I just merged the same patch from spark to trunk, as what was intended, hopefully that kept the history. Cleanup hbase-handler/pom.xml dependency list - Key: HIVE-7902 URL: https://issues.apache.org/jira/browse/HIVE-7902 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.13.0, 0.13.1 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7902.1.patch Noticed an extra dependency {{hive-service}} when changing dependency version of {{hive-hbase-handler}} from 0.12.0 to 0.13.0 in a third party application. Tracing the log of hbase-handler/pom.xml file, it is added as part of ant to maven migration and not because of any specific functionality requirement. Dependency {{hive-service}} is not needed in {{hive-hbase-handler}} and can be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7902) Cleanup hbase-handler/pom.xml dependency list
[ https://issues.apache.org/jira/browse/HIVE-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115667#comment-14115667 ] Brock Noland commented on HIVE-7902: Shoot... Yes thank you very much. Cleanup hbase-handler/pom.xml dependency list - Key: HIVE-7902 URL: https://issues.apache.org/jira/browse/HIVE-7902 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.13.0, 0.13.1 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7902.1.patch Noticed an extra dependency {{hive-service}} when changing dependency version of {{hive-hbase-handler}} from 0.12.0 to 0.13.0 in a third party application. Tracing the log of hbase-handler/pom.xml file, it is added as part of ant to maven migration and not because of any specific functionality requirement. Dependency {{hive-service}} is not needed in {{hive-hbase-handler}} and can be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]
On Aug. 29, 2014, 5:30 p.m., Brock Noland wrote: Hi Na, Thank you very much for the patch! I have one high level question: It appears we created the union_remove_spark* files because we wanted to add an additional property to the union_remove .q file? Meaning what is the delta beween union_remove_spark_1.q and union_remove_? Cheers! Na Yang wrote: Hi Brock, That is correct. the union_remove_spark* files include an extra config property hive.merge.sparkfile comparing to the corresponding union_remove_* files. Except that extra config property, all other queries in the union_remove_spark* file are same as the queries in the union_remove_* file. The hive.merge.sparkfile value is set according to the hive.merge.mapfile and hive.merge.mapredfile properity values in the orginal union_remove_* file. Regarding to the test result, we expect to see the same data are returned from the union_remove_spark* queries and the corresponding union_remove_* queries. Thanks, Na Hi, Thank you very much for the information! I think instead of adding the new union_remove_spark tests we should just add the hive.merge.sparkfile property to the union_remove q files. The extra property won't impact the existng tests other than an extra line of outpit. If instead we'd like to keep the hive_remove_spark* properties then we'd need to add a check to QTestUtil that does not run spark files for MR: https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L431 as the tests are currently running for both spark and MR. As such, I think the first solution (just add the property to the existing tests) makes sense. Thoughts? - Brock --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/#review51889 --- On Aug. 29, 2014, 6:44 a.m., Na Yang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/ --- (Updated Aug. 29, 2014, 6:44 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-7870 https://issues.apache.org/jira/browse/HIVE-7870 Repository: hive-git Description --- HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch] The cause of this problem is during spark/tez task generation, the union file sink operator are cloned to two new filesink operator. The linkedfilesinkdesc info for those new filesink operators are missing. In addition, the two new filesink operators also need to be linked together. Diffs - itests/src/test/resources/testconfiguration.properties 6393671 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_8.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_9.q PRE-CREATION ql/src/test/results/clientpositive/spark/sample8.q.out c7e333b ql/src/test/results/clientpositive/spark/union10.q.out 20c681e
[jira] [Updated] (HIVE-7913) Simplify predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7913: -- Description: Simplify predicates for disjunctive predicates so that can get pushed down to the scan. For TPC-DS query 13 we push down predicates in the following form where c_martial_status in ('M','D','U') etc.. {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} was: Simplify predicates for disjunctive predicates so that can get pushed down to the scan {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} Simplify predicates for CBO --- Key: HIVE-7913 URL:
Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]
On Aug. 29, 2014, 5:30 p.m., Brock Noland wrote: Hi Na, Thank you very much for the patch! I have one high level question: It appears we created the union_remove_spark* files because we wanted to add an additional property to the union_remove .q file? Meaning what is the delta beween union_remove_spark_1.q and union_remove_? Cheers! Na Yang wrote: Hi Brock, That is correct. the union_remove_spark* files include an extra config property hive.merge.sparkfile comparing to the corresponding union_remove_* files. Except that extra config property, all other queries in the union_remove_spark* file are same as the queries in the union_remove_* file. The hive.merge.sparkfile value is set according to the hive.merge.mapfile and hive.merge.mapredfile properity values in the orginal union_remove_* file. Regarding to the test result, we expect to see the same data are returned from the union_remove_spark* queries and the corresponding union_remove_* queries. Thanks, Na Brock Noland wrote: Hi, Thank you very much for the information! I think instead of adding the new union_remove_spark tests we should just add the hive.merge.sparkfile property to the union_remove q files. The extra property won't impact the existng tests other than an extra line of outpit. If instead we'd like to keep the hive_remove_spark* properties then we'd need to add a check to QTestUtil that does not run spark files for MR: https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L431 as the tests are currently running for both spark and MR. As such, I think the first solution (just add the property to the existing tests) makes sense. Thoughts? Hi Brock, Thank you for your suggestion. I also prefer the first solution. Let me modify the existing union_remove q files and re-genenrate the .q.out files for both MR and Spark. Thanks, Na - Na --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/#review51889 --- On Aug. 29, 2014, 6:44 a.m., Na Yang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/ --- (Updated Aug. 29, 2014, 6:44 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-7870 https://issues.apache.org/jira/browse/HIVE-7870 Repository: hive-git Description --- HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch] The cause of this problem is during spark/tez task generation, the union file sink operator are cloned to two new filesink operator. The linkedfilesinkdesc info for those new filesink operators are missing. In addition, the two new filesink operators also need to be linked together. Diffs - itests/src/test/resources/testconfiguration.properties 6393671 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_7.q PRE-CREATION
[jira] [Updated] (HIVE-7913) Simplify predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7913: -- Description: Simplify predicates for disjunctive predicates so that can get pushed down to the scan. For TPC-DS query 13 we push down predicates in the following form where c_martial_status in ('M','D','U') etc.. {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} This is the plan currently generated without any predicate simplification {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 8 (BROADCAST_EDGE) Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1 Vertices: Map 1 Map Operator Tree: TableScan alias: customer_address Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ca_address_sk (type: int), ca_state (type: string), ca_country (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce
[jira] [Updated] (HIVE-7913) Simplify filter predicates for CBO
[ https://issues.apache.org/jira/browse/HIVE-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7913: -- Summary: Simplify filter predicates for CBO (was: Simplify predicates for CBO) Simplify filter predicates for CBO -- Key: HIVE-7913 URL: https://issues.apache.org/jira/browse/HIVE-7913 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: 0.14.0 Simplify predicates for disjunctive predicates so that can get pushed down to the scan. For TPC-DS query 13 we push down predicates in the following form where c_martial_status in ('M','D','U') etc.. {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} This is the plan currently generated without any predicate simplification {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 8 (BROADCAST_EDGE) Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1 Vertices: Map 1 Map Operator Tree: TableScan alias: customer_address Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ca_address_sk (type: int), ca_state (type: string), ca_country (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and d_date_sk is not null)
[jira] [Created] (HIVE-7914) Simplify join predicates for CBO to avoid cross products
Mostafa Mokhtar created HIVE-7914: - Summary: Simplify join predicates for CBO to avoid cross products Key: HIVE-7914 URL: https://issues.apache.org/jira/browse/HIVE-7914 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Laljo John Pullokkaran Fix For: 0.14.0 Simplify predicates for disjunctive predicates so that can get pushed down to the scan. For TPC-DS query 13 we push down predicates in the following form where c_martial_status in ('M','D','U') etc.. {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} This is the plan currently generated without any predicate simplification {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 8 (BROADCAST_EDGE) Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1 Vertices: Map 1 Map Operator Tree: TableScan alias: customer_address Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ca_address_sk (type: int), ca_state (type: string), ca_country (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics:
[jira] [Updated] (HIVE-7914) Simplify join predicates for CBO to avoid cross products
[ https://issues.apache.org/jira/browse/HIVE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7914: -- Description: Simplify join predicates for disjunctive predicates to avoid cross products. For TPC-DS query 13 we generate a cross products. The join predicate on (store_sales x customer_demographics) , (store_sales x household_demographics) and (store_sales x customer_address) can be pull up to avoid the cross products {code} select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales ,store ,customer_demographics ,household_demographics ,customer_address ,date_dim where store.s_store_sk = store_sales.ss_store_sk and store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'M' and customer_demographics.cd_education_status = '4 yr Degree' and store_sales.ss_sales_price between 100.00 and 150.00 and household_demographics.hd_dep_count = 3 )or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk and customer_demographics.cd_marital_status = 'D' and customer_demographics.cd_education_status = 'Primary' and store_sales.ss_sales_price between 50.00 and 100.00 and household_demographics.hd_dep_count = 1 ) or (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk and customer_demographics.cd_demo_sk = ss_cdemo_sk and customer_demographics.cd_marital_status = 'U' and customer_demographics.cd_education_status = 'Advanced Degree' and store_sales.ss_sales_price between 150.00 and 200.00 and household_demographics.hd_dep_count = 1 )) and((store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('KY', 'GA', 'NM') and store_sales.ss_net_profit between 100 and 200 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('MT', 'OR', 'IN') and store_sales.ss_net_profit between 150 and 300 ) or (store_sales.ss_addr_sk = customer_address.ca_address_sk and customer_address.ca_country = 'United States' and customer_address.ca_state in ('WI', 'MO', 'WV') and store_sales.ss_net_profit between 50 and 250 )) ; {code} This is the plan currently generated without any predicate simplification {code} Warning: Map Join MAPJOIN[59][bigTable=?] in task 'Map 8' is a cross product Warning: Map Join MAPJOIN[58][bigTable=?] in task 'Map 8' is a cross product Warning: Shuffle Join JOIN[29][tables = [$hdt$_5, $hdt$_6]] in Stage 'Reducer 2' is a cross product OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 7 - Map 8 (BROADCAST_EDGE) Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE) Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE) Reducer 3 - Reducer 2 (SIMPLE_EDGE) DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1 Vertices: Map 1 Map Operator Tree: TableScan alias: customer_address Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: ca_address_sk (type: int), ca_state (type: string), ca_country (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 4000 Data size: 40595195284 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((d_year = 2001) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE Select Operator
[jira] [Updated] (HIVE-7908) CBO: Handle Windowing functions part of expressions
[ https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7908: - Labels: cbo (was: ) CBO: Handle Windowing functions part of expressions --- Key: HIVE-7908 URL: https://issues.apache.org/jira/browse/HIVE-7908 Project: Hive Issue Type: Bug Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Labels: cbo Attachments: HIVE-7908.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7908) CBO: Handle Windowing functions part of expressions
[ https://issues.apache.org/jira/browse/HIVE-7908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7908: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to branch. Thanks [~jpullokkaran]! CBO: Handle Windowing functions part of expressions --- Key: HIVE-7908 URL: https://issues.apache.org/jira/browse/HIVE-7908 Project: Hive Issue Type: Bug Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Labels: cbo Attachments: HIVE-7908.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7901) CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version)
[ https://issues.apache.org/jira/browse/HIVE-7901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115734#comment-14115734 ] Hive QA commented on HIVE-7901: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12665374/hive-7901.01.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6127 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/564/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/564/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-564/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12665374 CLONE - pig -useHCatalog with embedded metastore fails to pass command line args to metastore (org.apache.hive.hcatalog version) Key: HIVE-7901 URL: https://issues.apache.org/jira/browse/HIVE-7901 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Eric Hanson Attachments: hive-7901.01.patch This fails because the embedded metastore can't connect to the database because the command line -D arguments passed to pig are not getting passed to the metastore when the embedded metastore is created. Using hive.metastore.uris set to the empty string causes creation of an embedded metastore. pig -useHCatalog -Dhive.metastore.uris= -Djavax.jdo.option.ConnectionPassword=AzureSQLDBXYZ The goal is to allow a pig job submitted via WebHCat to specify a metastore to use via job arguments. That is not working because it is not possible to pass Djavax.jdo.option.ConnectionPassword and other necessary arguments to the embedded metastore. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 25176: HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch]
On Aug. 29, 2014, 5:30 p.m., Brock Noland wrote: Hi Na, Thank you very much for the patch! I have one high level question: It appears we created the union_remove_spark* files because we wanted to add an additional property to the union_remove .q file? Meaning what is the delta beween union_remove_spark_1.q and union_remove_? Cheers! Na Yang wrote: Hi Brock, That is correct. the union_remove_spark* files include an extra config property hive.merge.sparkfile comparing to the corresponding union_remove_* files. Except that extra config property, all other queries in the union_remove_spark* file are same as the queries in the union_remove_* file. The hive.merge.sparkfile value is set according to the hive.merge.mapfile and hive.merge.mapredfile properity values in the orginal union_remove_* file. Regarding to the test result, we expect to see the same data are returned from the union_remove_spark* queries and the corresponding union_remove_* queries. Thanks, Na Brock Noland wrote: Hi, Thank you very much for the information! I think instead of adding the new union_remove_spark tests we should just add the hive.merge.sparkfile property to the union_remove q files. The extra property won't impact the existng tests other than an extra line of outpit. If instead we'd like to keep the hive_remove_spark* properties then we'd need to add a check to QTestUtil that does not run spark files for MR: https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L431 as the tests are currently running for both spark and MR. As such, I think the first solution (just add the property to the existing tests) makes sense. Thoughts? Na Yang wrote: Hi Brock, Thank you for your suggestion. I also prefer the first solution. Let me modify the existing union_remove q files and re-genenrate the .q.out files for both MR and Spark. Thanks, Na Awesome, thanks!! - Brock --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/#review51889 --- On Aug. 29, 2014, 6:44 a.m., Na Yang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25176/ --- (Updated Aug. 29, 2014, 6:44 a.m.) Review request for hive, Brock Noland, Szehon Ho, and Xuefu Zhang. Bugs: HIVE-7870 https://issues.apache.org/jira/browse/HIVE-7870 Repository: hive-git Description --- HIVE-7870: Insert overwrite table query does not generate correct task plan [Spark Branch] The cause of this problem is during spark/tez task generation, the union file sink operator are cloned to two new filesink operator. The linkedfilesinkdesc info for those new filesink operators are missing. In addition, the two new filesink operators also need to be linked together. Diffs - itests/src/test/resources/testconfiguration.properties 6393671 ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 9c808d4 ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkProcContext.java 5ddc16d ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java 379a39c ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 76fc290 ql/src/test/queries/clientpositive/union_remove_spark_1.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_10.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_11.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_15.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_16.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_17.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_18.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_19.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_2.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_20.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_21.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_24.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_25.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_3.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_4.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_5.q PRE-CREATION ql/src/test/queries/clientpositive/union_remove_spark_6.q PRE-CREATION
[jira] [Created] (HIVE-7915) Expose High and Low value in plan.ColStatistics
Harish Butani created HIVE-7915: --- Summary: Expose High and Low value in plan.ColStatistics Key: HIVE-7915 URL: https://issues.apache.org/jira/browse/HIVE-7915 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Priority: Minor These are being read from the Metastore but not populated in ColumnStatistics. One of the uses of this is HIVE-7905 -- This message was sent by Atlassian JIRA (v6.2#6252)