[jira] [Commented] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225898#comment-14225898 ] Hive QA commented on HIVE-8943: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683740/HIVE-8943.1-spark.patch {color:red}ERROR:{color} -1 due to 53 failed/errored test(s), 7179 tests executed *Failed tests:* {noformat} TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file TestGenericUDFOPNumeric - did not produce a TEST-*.xml file TestHBaseKeyFactory - did not produce a TEST-*.xml file TestHBaseKeyFactory2 - did not produce a TEST-*.xml file TestHBaseKeyFactory3 - did not produce a TEST-*.xml file TestHBasePredicateDecomposer - did not produce a TEST-*.xml file TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file TestTezSessionState - did not produce a TEST-*.xml file TestURLHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_column_access_stats org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join19 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_hive_626 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_view org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_subquery2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mergejoins org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mergejoins_mixed org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin_union_remove_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin_union_remove_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt12 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt17 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt19 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt20 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt6 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin9 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/442/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/442/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-442/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 53 tests failed {noformat} This message is automatically
[jira] [Commented] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly
[ https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225921#comment-14225921 ] Hive QA commented on HIVE-8848: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683492/HIVE-8848.3.patch.txt {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6683 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithAvroExternalSchema org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithAvroSerClass org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseAvroColumnFamily org.apache.hadoop.hive.serde2.lazy.TestLazyArrayMapStruct.testLazyMapWithBadEntries org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1905/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1905/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1905/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683492 - PreCommit-HIVE-TRUNK-Build data loading from text files or text file processing doesn't handle nulls correctly --- Key: HIVE-8848 URL: https://issues.apache.org/jira/browse/HIVE-8848 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-8848.01.patch, HIVE-8848.2.patch.txt, HIVE-8848.3.patch.txt, HIVE-8848.patch I am not sure how nulls are supposed to be stored in text tables, but after loading some data with null or NULL strings, or x00 characters, we get bunch of annoying logging from LazyPrimitive that data is not in INT format and was converted to null, with data being null (string saying null, I assume from the code). Either load should load them as nulls, or there should be some defined way to load nulls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225922#comment-14225922 ] Hive QA commented on HIVE-8828: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683566/HIVE-8828.9.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1906/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1906/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1906/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1906/source-prep.txt + [[ true == \t\r\u\e ]] + rm -rf ivy maven + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java' Reverted 'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java' Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java' Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java' Reverted 'data/files/cbo_t2.txt' Reverted 'data/files/cbo_t4.txt' Reverted 'data/files/cbo_t6.txt' Reverted 'data/files/cbo_t1.txt' Reverted 'data/files/cbo_t3.txt' Reverted 'data/files/cbo_t5.txt' Reverted 'accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/serde/FirstCharAccumuloCompositeRowId.java' Reverted 'accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/LazyAccumuloRow.java' Reverted 'accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/LazyAccumuloMap.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyString.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyMap.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyArray.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyNonPrimitive.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBinary.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUnion.java' Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java' Reverted 'ql/src/test/results/clientpositive/cbo_windowing.q.out' Reverted 'ql/src/test/results/clientpositive/cbo_udf_udaf.q.out' Reverted 'ql/src/test/results/clientpositive/cbo_limit.q.out' Reverted 'ql/src/test/results/clientpositive/cbo_gby.q.out' Reverted 'ql/src/test/results/clientpositive/tez/cbo_union.q.out' Reverted 'ql/src/test/results/clientpositive/tez/cbo_windowing.q.out' Reverted 'ql/src/test/results/clientpositive/tez/cbo_join.q.out' Reverted 'ql/src/test/results/clientpositive/tez/cbo_gby.q.out' Reverted 'ql/src/test/results/clientpositive/tez/cbo_limit.q.out' Reverted 'ql/src/test/results/clientpositive/tez/cbo_views.q.out' Reverted 'ql/src/test/results/clientpositive/tez/cbo_simple_select.q.out' Reverted 'ql/src/test/results/clientpositive/tez/cbo_udf_udaf.q.out' Reverted 'ql/src/test/results/clientpositive/tez/cbo_semijoin.q.out' Reverted 'ql/src/test/results/clientpositive/cbo_union.q.out' Reverted 'ql/src/test/results/clientpositive/cbo_simple_select.q.out' Reverted 'ql/src/test/results/clientpositive/cbo_semijoin.q.out' Reverted
[jira] [Commented] (HIVE-8953) 0.5.2-SNAPSHOT Dependency
[ https://issues.apache.org/jira/browse/HIVE-8953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225937#comment-14225937 ] Olaf Flebbe commented on HIVE-8953: --- Seems to be fixed in 0.14 branch. Any plans to release a 0.14.1 an urgent bug-fix release? 0.5.2-SNAPSHOT Dependency - Key: HIVE-8953 URL: https://issues.apache.org/jira/browse/HIVE-8953 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Environment: Compiling for Apache BIGTOP. Reporter: Olaf Flebbe I have the issue that hive shim 0.23 needs tez Version 0.5.2-SNAPSHOT. Hm, I have no clue what SNAPSHOT of Apache Tez should be used. There is no 0.5.2-SNAPSHOT in Maven Central Repository. Can I use 0.5.2 ? (This seems to be released) This relates to: [HIVE-8614] I have the same problem as the reporter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225953#comment-14225953 ] Hive QA commented on HIVE-8934: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683746/HIVE-8934.1-spark.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 7179 tests executed *Failed tests:* {noformat} TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file TestGenericUDFOPNumeric - did not produce a TEST-*.xml file TestHBaseKeyFactory - did not produce a TEST-*.xml file TestHBaseKeyFactory2 - did not produce a TEST-*.xml file TestHBaseKeyFactory3 - did not produce a TEST-*.xml file TestHBasePredicateDecomposer - did not produce a TEST-*.xml file TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file TestTezSessionState - did not produce a TEST-*.xml file TestURLHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/443/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/443/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-443/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683746 - PreCommit-HIVE-SPARK-Build Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch] -- Key: HIVE-8934 URL: https://issues.apache.org/jira/browse/HIVE-8934 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Attachments: HIVE-8934.1-spark.patch With MapJoin enabled, these two tests will generate incorrect results. This seem to be related to the HiveInputFormat that these two are using. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6914) parquet-hive cannot write nested map (map value is map)
[ https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225996#comment-14225996 ] Mickael Lacour commented on HIVE-6914: -- Thx parquet-hive cannot write nested map (map value is map) --- Key: HIVE-6914 URL: https://issues.apache.org/jira/browse/HIVE-6914 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0, 0.13.0 Reporter: Tongjie Chen Assignee: Ryan Blue Labels: parquet, serialization Fix For: 0.15.0 Attachments: HIVE-6914.1.patch, HIVE-6914.1.patch, HIVE-6914.2.patch, HIVE-6914.3.patch, HIVE-6914.4.patch, NestedMap.parquet // table schema (identical for both plain text version and parquet version) desc hive desc text_mmap; m map // sample nested map entry {level1:{level2_key1:value1,level2_key2:value2}} The following query will fail, insert overwrite table parquet_mmap select * from text_mmap; Caused by: parquet.io.ParquetEncodingException: This should be an ArrayWritable or MapWritable: org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106 at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226009#comment-14226009 ] Hive QA commented on HIVE-8924: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683752/HIVE-8924-spark.patch {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 7179 tests executed *Failed tests:* {noformat} TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file TestGenericUDFOPNumeric - did not produce a TEST-*.xml file TestHBaseKeyFactory - did not produce a TEST-*.xml file TestHBaseKeyFactory2 - did not produce a TEST-*.xml file TestHBaseKeyFactory3 - did not produce a TEST-*.xml file TestHBasePredicateDecomposer - did not produce a TEST-*.xml file TestTezSessionState - did not produce a TEST-*.xml file TestURLHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_view org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin9 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/444/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/444/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-444/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683752 - PreCommit-HIVE-SPARK-Build Investigate test failure for join_empty.q [Spark Branch] Key: HIVE-8924 URL: https://issues.apache.org/jira/browse/HIVE-8924 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho Attachments: HIVE-8924-spark.patch This query has an interesting case where the big table work is empty. Here's the MR plan: {noformat} STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: b Fetch Operator limit: -1 Alias - Map Local Operator Tree: b TableScan alias: b Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} keys: 0 UDFToDouble(key) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226010#comment-14226010 ] Hive QA commented on HIVE-8970: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683781/HIVE-8970.1-spark.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/445/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/445/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-445/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-445/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-spark-source ]] + [[ ! -d apache-svn-spark-source/.svn ]] + [[ ! -d apache-svn-spark-source ]] + cd apache-svn-spark-source + svn revert -R . Reverted 'itests/src/test/resources/testconfiguration.properties' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java' ++ svn status --no-ignore ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target itests/qtest-spark/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target common/target common/src/gen spark-client/target service/target contrib/target serde/target beeline/target cli/target odbc/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1641793. At revision 1641793. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12683781 - PreCommit-HIVE-SPARK-Build Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8970.1-spark.patch Right now, in Spark branch we enable MJ without looking at this configuration. The related code in {{SparkMapJoinOptimizer}} is commented out. We should only enable MJ when the flag is true. -- This message
[jira] [Commented] (HIVE-8962) Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2
[ https://issues.apache.org/jira/browse/HIVE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226050#comment-14226050 ] Hive QA commented on HIVE-8962: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683651/HIVE-8962.patch {color:green}SUCCESS:{color} +1 6683 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1907/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1907/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1907/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12683651 - PreCommit-HIVE-TRUNK-Build Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2 Key: HIVE-8962 URL: https://issues.apache.org/jira/browse/HIVE-8962 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chao Assignee: Chao Priority: Minor Attachments: HIVE-8962.patch Similar to HIVE-8936, we need to add {{SORT_QUERY_RESULTS}} to the following q-files: {noformat} ppd_multi_insert.q ptf_streaming.q subquery_exists.q subquery_multiinsert.q vectorized_ptf.q {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226114#comment-14226114 ] Chao commented on HIVE-8970: This patch applies cleanly on my machine, not sure why it failed. Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8970.1-spark.patch Right now, in Spark branch we enable MJ without looking at this configuration. The related code in {{SparkMapJoinOptimizer}} is commented out. We should only enable MJ when the flag is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8970: --- Attachment: HIVE-8970.2-spark.patch Re-attach the same patch to trigger test run. Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch Right now, in Spark branch we enable MJ without looking at this configuration. The related code in {{SparkMapJoinOptimizer}} is commented out. We should only enable MJ when the flag is true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8934: --- Attachment: HIVE-8934.2-spark.patch Re-attach the same patch to trigger test run. Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch] -- Key: HIVE-8934 URL: https://issues.apache.org/jira/browse/HIVE-8934 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Attachments: HIVE-8934.1-spark.patch, HIVE-8934.2-spark.patch With MapJoin enabled, these two tests will generate incorrect results. This seem to be related to the HiveInputFormat that these two are using. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8924: --- Attachment: HIVE-8924.2-spark.patch Re-attach the same patch to trigger test run. Investigate test failure for join_empty.q [Spark Branch] Key: HIVE-8924 URL: https://issues.apache.org/jira/browse/HIVE-8924 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch This query has an interesting case where the big table work is empty. Here's the MR plan: {noformat} STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: b Fetch Operator limit: -1 Alias - Map Local Operator Tree: b TableScan alias: b Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} keys: 0 UDFToDouble(key) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226191#comment-14226191 ] Hive QA commented on HIVE-8934: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683832/HIVE-8934.2-spark.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 7180 tests executed *Failed tests:* {noformat} TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file TestGenericUDFOPNumeric - did not produce a TEST-*.xml file TestHBaseKeyFactory - did not produce a TEST-*.xml file TestHBaseKeyFactory2 - did not produce a TEST-*.xml file TestHBaseKeyFactory3 - did not produce a TEST-*.xml file TestHBasePredicateDecomposer - did not produce a TEST-*.xml file TestTezSessionState - did not produce a TEST-*.xml file TestURLHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin10 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_multiinsert {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/446/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/446/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-446/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683832 - PreCommit-HIVE-SPARK-Build Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch] -- Key: HIVE-8934 URL: https://issues.apache.org/jira/browse/HIVE-8934 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Attachments: HIVE-8934.1-spark.patch, HIVE-8934.2-spark.patch With MapJoin enabled, these two tests will generate incorrect results. This seem to be related to the HiveInputFormat that these two are using. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226193#comment-14226193 ] Hive QA commented on HIVE-8970: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683831/HIVE-8970.2-spark.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/447/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/447/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-447/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-447/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-spark-source ]] + [[ ! -d apache-svn-spark-source/.svn ]] + [[ ! -d apache-svn-spark-source ]] + cd apache-svn-spark-source + svn revert -R . Reverted 'itests/src/test/resources/testconfiguration.properties' Reverted 'ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out' Reverted 'ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinEagerRowContainer.java' ++ svn status --no-ignore ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target itests/qtest-spark/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target common/target common/src/gen spark-client/target service/target contrib/target serde/target beeline/target cli/target odbc/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1641818. At revision 1641818. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12683831 - PreCommit-HIVE-SPARK-Build Enable map join optimization only when hive.auto.convert.join is true [Spark Branch] Key: HIVE-8970 URL: https://issues.apache.org/jira/browse/HIVE-8970 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Fix For: spark-branch
[jira] [Commented] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID
[ https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226211#comment-14226211 ] Hive QA commented on HIVE-8875: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683677/HIVE-8875.2.patch {color:green}SUCCESS:{color} +1 6683 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1908/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1908/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1908/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12683677 - PreCommit-HIVE-TRUNK-Build hive.optimize.sort.dynamic.partition should be turned off for ACID -- Key: HIVE-8875 URL: https://issues.apache.org/jira/browse/HIVE-8875 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8875.2.patch, HIVE-8875.patch Turning this on causes ACID insert, updates, and deletes to produce non-optimal plans with extra reduce phases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226270#comment-14226270 ] Xuefu Zhang commented on HIVE-8836: --- [~ruili], I think the number of reducers changed because of the cluster changes. Previously the plan is generated with one node with 4 cores (local[4]). Now the cluster has 2 nodes and one core each. Memory configuration is also different. I guess it's hard to tweek the cluster configuration so that the same number of reducer results. For now, I think we have to go thru the list and analyze failures one by one. It's a long list, and maybe it can be divided among people so that each only take a slice of it. Briefly checking the result, it seems the failures are caused by any of the following reasons: 1. reducer number change, which is okay. 2. result diff. It could be a matter of ordering, but could be different result also. 3. test failed to run. I noticed that we are using local-cluster[2,1,2048]. Maybe we should have a more general case where one node has more than one core. Also, we may need to adjust the memory settings. Once we have a representative of a small cluster, we probably will stay with it for some time. Enable automatic tests with remote spark client [Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226272#comment-14226272 ] Hive QA commented on HIVE-8924: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683834/HIVE-8924.2-spark.patch {color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 7179 tests executed *Failed tests:* {noformat} TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file TestGenericUDFOPNumeric - did not produce a TEST-*.xml file TestHBaseKeyFactory - did not produce a TEST-*.xml file TestHBaseKeyFactory2 - did not produce a TEST-*.xml file TestHBaseKeyFactory3 - did not produce a TEST-*.xml file TestHBasePredicateDecomposer - did not produce a TEST-*.xml file TestTezSessionState - did not produce a TEST-*.xml file TestURLHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_view org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin9 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/448/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/448/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-448/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 16 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683834 - PreCommit-HIVE-SPARK-Build Investigate test failure for join_empty.q [Spark Branch] Key: HIVE-8924 URL: https://issues.apache.org/jira/browse/HIVE-8924 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch This query has an interesting case where the big table work is empty. Here's the MR plan: {noformat} STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: b Fetch Operator limit: -1 Alias - Map Local Operator Tree: b TableScan alias: b Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} keys: 0 UDFToDouble(key) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226277#comment-14226277 ] Xuefu Zhang commented on HIVE-8924: --- [~csun], I think you will need to regenerate your .out because of HIVE recently resolved HIVE-8961. Investigate test failure for join_empty.q [Spark Branch] Key: HIVE-8924 URL: https://issues.apache.org/jira/browse/HIVE-8924 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch This query has an interesting case where the big table work is empty. Here's the MR plan: {noformat} STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: b Fetch Operator limit: -1 Alias - Map Local Operator Tree: b TableScan alias: b Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} keys: 0 UDFToDouble(key) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226277#comment-14226277 ] Xuefu Zhang edited comment on HIVE-8924 at 11/26/14 3:08 PM: - [~szehon], I think you will need to regenerate your .out because of HIVE recently resolved HIVE-8961. was (Author: xuefuz): [~csun], I think you will need to regenerate your .out because of HIVE recently resolved HIVE-8961. Investigate test failure for join_empty.q [Spark Branch] Key: HIVE-8924 URL: https://issues.apache.org/jira/browse/HIVE-8924 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch This query has an interesting case where the big table work is empty. Here's the MR plan: {noformat} STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: b Fetch Operator limit: -1 Alias - Map Local Operator Tree: b TableScan alias: b Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} keys: 0 UDFToDouble(key) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226277#comment-14226277 ] Xuefu Zhang edited comment on HIVE-8924 at 11/26/14 3:08 PM: - [~szehon], [~csun], I think you will need to regenerate your .out because of HIVE recently resolved HIVE-8961. was (Author: xuefuz): [~szehon], I think you will need to regenerate your .out because of HIVE recently resolved HIVE-8961. Investigate test failure for join_empty.q [Spark Branch] Key: HIVE-8924 URL: https://issues.apache.org/jira/browse/HIVE-8924 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch This query has an interesting case where the big table work is empty. Here's the MR plan: {noformat} STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: b Fetch Operator limit: -1 Alias - Map Local Operator Tree: b TableScan alias: b Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} keys: 0 UDFToDouble(key) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226283#comment-14226283 ] Xuefu Zhang commented on HIVE-8934: --- [~csun], I think you will need to regenerate your .out because of HIVE recently resolved HIVE-8961. Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch] -- Key: HIVE-8934 URL: https://issues.apache.org/jira/browse/HIVE-8934 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Attachments: HIVE-8934.1-spark.patch, HIVE-8934.2-spark.patch With MapJoin enabled, these two tests will generate incorrect results. This seem to be related to the HiveInputFormat that these two are using. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8962) Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2
[ https://issues.apache.org/jira/browse/HIVE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8962: -- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks to Chao for the contribution. Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2 Key: HIVE-8962 URL: https://issues.apache.org/jira/browse/HIVE-8962 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chao Assignee: Chao Priority: Minor Fix For: 0.15.0 Attachments: HIVE-8962.patch Similar to HIVE-8936, we need to add {{SORT_QUERY_RESULTS}} to the following q-files: {noformat} ppd_multi_insert.q ptf_streaming.q subquery_exists.q subquery_multiinsert.q vectorized_ptf.q {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-7329) Create SparkWork [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7329: -- Comment: was deleted (was: hi,XueFu. i builted hive on spark (spark branch on https://github.com/apache/hive.git),and spark (master branch on https://github.com/apache/spark.git),and my spark assembly jar is spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,and set this jar path into hive-env.sh (set to HIVE_AUX_JARS_PATH),and start hive ,do follow commnad to start a query : set hive.execution.engine=spark; set spark.master=spark://:7077; set spark.eventLog.enabled=true; set spark.executor.memory=1024m; set spark.serializer=org.apache.spark.serializer.KryoSerializer; but it seems it still use mr for query engine,then i remote debug it ,and found it can't jump to spark engine. i do what this https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started be told,can u tell me anything wrong?) Create SparkWork [Spark Branch] --- Key: HIVE-7329 URL: https://issues.apache.org/jira/browse/HIVE-7329 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 0.13.1 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: spark-branch Attachments: HIVE-7329.patch This class encapsulates all the work objects that can be executed in a single Spark job. NO PRECOMMIT TESTS. This is for spark branch only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8828: --- Status: Open (was: Patch Available) Remove hadoop 20 shims -- Key: HIVE-8828 URL: https://issues.apache.org/jira/browse/HIVE-8828 Project: Hive Issue Type: Task Components: Shims Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8828.1.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, HIVE-8828.patch CLEAR LIBRARY CACHE See : [mailing list discussion | http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8828: --- Status: Patch Available (was: Open) Remove hadoop 20 shims -- Key: HIVE-8828 URL: https://issues.apache.org/jira/browse/HIVE-8828 Project: Hive Issue Type: Task Components: Shims Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, HIVE-8828.patch CLEAR LIBRARY CACHE See : [mailing list discussion | http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8828: --- Attachment: HIVE-8828.10.patch Rebased to trunk. Remove hadoop 20 shims -- Key: HIVE-8828 URL: https://issues.apache.org/jira/browse/HIVE-8828 Project: Hive Issue Type: Task Components: Shims Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, HIVE-8828.patch CLEAR LIBRARY CACHE See : [mailing list discussion | http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8967) Fix bucketmapjoin7.q determinism
[ https://issues.apache.org/jira/browse/HIVE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226445#comment-14226445 ] Hive QA commented on HIVE-8967: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683703/HIVE-8967.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6683 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1909/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1909/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1909/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683703 - PreCommit-HIVE-TRUNK-Build Fix bucketmapjoin7.q determinism Key: HIVE-8967 URL: https://issues.apache.org/jira/browse/HIVE-8967 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.15.0 Attachments: HIVE-8967.patch In working on HIVE-8963, we found the output is not determistic. We can add order by to make sure the output fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226461#comment-14226461 ] Brock Noland commented on HIVE-8836: I will change it to two cores and then re-generate the outputs. This should allow us to differentiate between failed tests, changes outputs, and just reducer changes. Enable automatic tests with remote spark client [Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8956) Hive hangs while some error/exception happens beyond job execution [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226504#comment-14226504 ] Marcelo Vanzin commented on HIVE-8956: -- I haven't looked at akka in that much detail to see if there is some API to catch those. You can enable akka logging (set {{spark.akka.logLifecycleEvents}} to true) and that will print these errors to the logs. Spark tries to serialize data before sending it to akka, to try to catch serialization issues, but that adds overhead, and it also doesn't help in the deserialization path... Hive hangs while some error/exception happens beyond job execution [Spark Branch] - Key: HIVE-8956 URL: https://issues.apache.org/jira/browse/HIVE-8956 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8956.1-spark.patch Remote spark client communicate with remote spark context asynchronously, if error/exception is throw out during job execution in remote spark context, it would be wrapped and send back to remote spark client, but if error/exception is throw out beyond job execution, such as job serialized failed, remote spark client would never know what's going on in remote spark context, and it would hangs there. Set a timeout in remote spark client side may not a great idea, as we are not sure how long the query executed in spark cluster. we need find a way to check whether job has failed(whole life cycle) in remote spark context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226515#comment-14226515 ] Marcelo Vanzin commented on HIVE-8957: -- I think a fix here will be a little more complicated than that. Let me look at the code and think about it. Remote spark context needs to clean up itself in case of connection timeout [Spark Branch] -- Key: HIVE-8957 URL: https://issues.apache.org/jira/browse/HIVE-8957 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-8957.1-spark.patch In the current SparkClient implementation (class SparkClientImpl), the constructor does some initialization and in the end waits for the remote driver to connect. In case of timeout, it just throws an exception without cleaning itself. The cleanup is necessary to release system resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226517#comment-14226517 ] Marcelo Vanzin commented on HIVE-8574: -- Actually, after a quick look at the code again, this might not be a problem. Metrics are kept per-job handle. Job handles are managed by the code submitting jobs - leave them for garbage collection and metrics go away. So unless we're worried about a single job creating so many tasks that it will run the driver out of memory with all the metrics data, this shouldn't really be an issue. Enhance metrics gathering in Spark Client [Spark Branch] Key: HIVE-8574 URL: https://issues.apache.org/jira/browse/HIVE-8574 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin The current implementation of metrics gathering in the Spark client is a little hacky. First, it's awkward to use (and the implementation is also pretty ugly). Second, it will just collect metrics indefinitely, so in the long term it turns into a huge memory leak. We need a simplified interface and some mechanism for disposing of old metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6421) abs() should preserve precision/scale of decimal input
[ https://issues.apache.org/jira/browse/HIVE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226551#comment-14226551 ] Jason Dere commented on HIVE-6421: -- Failure has been occurring in other precommit runs and does not appear to be related. [~ashutoshc], does this one look ok? abs() should preserve precision/scale of decimal input -- Key: HIVE-6421 URL: https://issues.apache.org/jira/browse/HIVE-6421 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6421.1.txt, HIVE-6421.2.patch, HIVE-6421.3.patch {noformat} hive describe dec1; OK c1decimal(10,2) None hive explain select c1, abs(c1) from dec1; ... Select Operator expressions: c1 (type: decimal(10,2)), abs(c1) (type: decimal(38,18)) {noformat} Given that abs() is a GenericUDF it should be possible for the return type precision/scale to match the input precision/scale. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8967) Fix bucketmapjoin7.q determinism
[ https://issues.apache.org/jira/browse/HIVE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-8967: -- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk and also merged to Spark branch. Thanks, Jimmy. Fix bucketmapjoin7.q determinism Key: HIVE-8967 URL: https://issues.apache.org/jira/browse/HIVE-8967 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.15.0 Attachments: HIVE-8967.patch In working on HIVE-8963, we found the output is not determistic. We can add order by to make sure the output fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8971) HIVE-8965 exposed some classes which start with Test but are not tests
[ https://issues.apache.org/jira/browse/HIVE-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226579#comment-14226579 ] Hive QA commented on HIVE-8971: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683767/HIVE-8971.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6683 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1910/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1910/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1910/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683767 - PreCommit-HIVE-TRUNK-Build HIVE-8965 exposed some classes which start with Test but are not tests -- Key: HIVE-8971 URL: https://issues.apache.org/jira/browse/HIVE-8971 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8971.patch From the output here: https://issues.apache.org/jira/browse/HIVE-8836?focusedCommentId=14225742page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14225742 I've looked at the TestHBase* classes and they are not tests. PTest cannot support classes which start with Test but are not tests. {noformat} TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file TestGenericUDFOPNumeric - did not produce a TEST-*.xml file TestHBaseKeyFactory - did not produce a TEST-*.xml file TestHBaseKeyFactory2 - did not produce a TEST-*.xml file TestHBaseKeyFactory3 - did not produce a TEST-*.xml file TestHBasePredicateDecomposer - did not produce a TEST-*.xml file TestTezSessionState - did not produce a TEST-*.xml file TestURLHook - did not produce a TEST-*.xml file {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8971) HIVE-8965 exposed some classes which start with Test but are not tests
[ https://issues.apache.org/jira/browse/HIVE-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8971: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed this as I will be updating the trunk ptest server shortly to take advantage of HIVE-8965 in order to improve some of the delays we've seen in testing lately. HIVE-8965 exposed some classes which start with Test but are not tests -- Key: HIVE-8971 URL: https://issues.apache.org/jira/browse/HIVE-8971 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Fix For: 0.15.0 Attachments: HIVE-8971.patch From the output here: https://issues.apache.org/jira/browse/HIVE-8836?focusedCommentId=14225742page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14225742 I've looked at the TestHBase* classes and they are not tests. PTest cannot support classes which start with Test but are not tests. {noformat} TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file TestGenericUDFOPNumeric - did not produce a TEST-*.xml file TestHBaseKeyFactory - did not produce a TEST-*.xml file TestHBaseKeyFactory2 - did not produce a TEST-*.xml file TestHBaseKeyFactory3 - did not produce a TEST-*.xml file TestHBasePredicateDecomposer - did not produce a TEST-*.xml file TestTezSessionState - did not produce a TEST-*.xml file TestURLHook - did not produce a TEST-*.xml file {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7896) orcfiledump should be able to dump data
[ https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7896: - Status: Patch Available (was: Open) orcfiledump should be able to dump data --- Key: HIVE-7896 URL: https://issues.apache.org/jira/browse/HIVE-7896 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, alltypes2.txt The FileDumper utility in orc, exposed as a service as orcfiledump, can print out metadata from Orc files but not the actual data. Being able to dump the data is also useful in some debugging contexts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7896) orcfiledump should be able to dump data
[ https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7896: - Attachment: HIVE-7896.2.patch Fixed error found by Prasanth and added unit test to confirm it. Also changed the help message per Prasanth's suggestion. orcfiledump should be able to dump data --- Key: HIVE-7896 URL: https://issues.apache.org/jira/browse/HIVE-7896 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, alltypes2.txt The FileDumper utility in orc, exposed as a service as orcfiledump, can print out metadata from Orc files but not the actual data. Being able to dump the data is also useful in some debugging contexts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
Julian Hyde created HIVE-8974: - Summary: Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Reporter: Julian Hyde Assignee: Gunther Hagleitner Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
[ https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesús Camacho RodrÃguez reassigned HIVE-8974: - Assignee: Jesús Camacho RodrÃguez (was: Gunther Hagleitner) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames) Key: HIVE-8974 URL: https://issues.apache.org/jira/browse/HIVE-8974 Project: Hive Issue Type: Task Reporter: Julian Hyde Assignee: Jesús Camacho RodrÃguez Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure and renamed a lot of classes. CALCITE-296 has the details, including a description of the before:after mapping. This task is to upgrade to the version of Calcite that has the renamed packages. There is a 1.0.0-SNAPSHOT in Apache nexus. Calcite functionality has not changed significantly, so it should be straightforward to rename. This task should be completed ASAP, before Calcite moves on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID
[ https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8875: - Resolution: Fixed Status: Resolved (was: Patch Available) hive.optimize.sort.dynamic.partition should be turned off for ACID -- Key: HIVE-8875 URL: https://issues.apache.org/jira/browse/HIVE-8875 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.15.0 Attachments: HIVE-8875.2.patch, HIVE-8875.patch Turning this on causes ACID insert, updates, and deletes to produce non-optimal plans with extra reduce phases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID
[ https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8875: - Fix Version/s: 0.15.0 hive.optimize.sort.dynamic.partition should be turned off for ACID -- Key: HIVE-8875 URL: https://issues.apache.org/jira/browse/HIVE-8875 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.15.0 Attachments: HIVE-8875.2.patch, HIVE-8875.patch Turning this on causes ACID insert, updates, and deletes to produce non-optimal plans with extra reduce phases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226651#comment-14226651 ] Brock Noland commented on HIVE-8574: bq. So unless we're worried about a single job creating so many tasks that it will run the driver out of memory with all the metrics data, this shouldn't really be an issue. Any idea how much memory would be consumed for say 100K tasks? Enhance metrics gathering in Spark Client [Spark Branch] Key: HIVE-8574 URL: https://issues.apache.org/jira/browse/HIVE-8574 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin The current implementation of metrics gathering in the Spark client is a little hacky. First, it's awkward to use (and the implementation is also pretty ugly). Second, it will just collect metrics indefinitely, so in the long term it turns into a huge memory leak. We need a simplified interface and some mechanism for disposing of old metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6421) abs() should preserve precision/scale of decimal input
[ https://issues.apache.org/jira/browse/HIVE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226659#comment-14226659 ] Ashutosh Chauhan commented on HIVE-6421: yup. +1 abs() should preserve precision/scale of decimal input -- Key: HIVE-6421 URL: https://issues.apache.org/jira/browse/HIVE-6421 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-6421.1.txt, HIVE-6421.2.patch, HIVE-6421.3.patch {noformat} hive describe dec1; OK c1decimal(10,2) None hive explain select c1, abs(c1) from dec1; ... Select Operator expressions: c1 (type: decimal(10,2)), abs(c1) (type: decimal(38,18)) {noformat} Given that abs() is a GenericUDF it should be possible for the return type precision/scale to match the input precision/scale. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226668#comment-14226668 ] Marcelo Vanzin commented on HIVE-8574: -- Rounding up, each task metrics data structure will take around 256 bytes. So ~25MB? Enhance metrics gathering in Spark Client [Spark Branch] Key: HIVE-8574 URL: https://issues.apache.org/jira/browse/HIVE-8574 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin The current implementation of metrics gathering in the Spark client is a little hacky. First, it's awkward to use (and the implementation is also pretty ugly). Second, it will just collect metrics indefinitely, so in the long term it turns into a huge memory leak. We need a simplified interface and some mechanism for disposing of old metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q
Jesus Camacho Rodriguez created HIVE-8975: - Summary: Possible performance regression on bucket_map_join_tez2.q Key: HIVE-8975 URL: https://issues.apache.org/jira/browse/HIVE-8975 Project: Hive Issue Type: Bug Components: Logical Optimizer, Statistics Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join. The query is the following one: {noformat} select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key {noformat} The plan before removing the projections is: {noformat} TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} And after removing identity projections: {noformat} TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the optimization does not kick in. The reason for the stats change in the GroupBy operator is in [this line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633], where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q
[ https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226695#comment-14226695 ] Jesus Camacho Rodriguez commented on HIVE-8975: --- [~prasanth_j], what do you think? Possible performance regression on bucket_map_join_tez2.q - Key: HIVE-8975 URL: https://issues.apache.org/jira/browse/HIVE-8975 Project: Hive Issue Type: Bug Components: Logical Optimizer, Statistics Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join. The query is the following one: {noformat} select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key {noformat} The plan before removing the projections is: {noformat} TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} And after removing identity projections: {noformat} TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the optimization does not kick in. The reason for the stats change in the GroupBy operator is in [this line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633], where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226696#comment-14226696 ] Hive QA commented on HIVE-8828: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683859/HIVE-8828.10.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1913/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1913/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1913/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1913/source-prep.txt + [[ true == \t\r\u\e ]] + rm -rf ivy maven + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'hbase-handler/src/test/results/positive/hbase_custom_key.q.out' Reverted 'hbase-handler/src/test/results/positive/hbase_custom_key2.q.out' Reverted 'hbase-handler/src/test/results/positive/hbase_custom_key3.q.out' Reverted 'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java' Reverted 'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java' Reverted 'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory3.java' Reverted 'hbase-handler/src/test/queries/positive/hbase_custom_key.q' Reverted 'hbase-handler/src/test/queries/positive/hbase_custom_key2.q' Reverted 'hbase-handler/src/test/queries/positive/hbase_custom_key3.q' Reverted 'itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAuthzApiEmbedAuthorizerInRemote.java' Reverted 'itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAuthorizationApiAuthorizer.java' Reverted 'itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAuthzApiEmbedAuthorizerInEmbed.java' Reverted 'contrib/src/test/queries/clientpositive/url_hook.q' Reverted 'contrib/src/java/org/apache/hadoop/hive/contrib/metastore/hooks/TestURLHook.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionState.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPDivide.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPMod.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPMultiply.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPPlus.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPMinus.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFPosMod.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPNumeric.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target shims/scheduler/target packaging/target hbase-handler/target hbase-handler/src/test/org/apache/hadoop/hive/hbase/SampleHBaseKeyFactory.java hbase-handler/src/test/org/apache/hadoop/hive/hbase/SampleHBaseKeyFactory2.java hbase-handler/src/test/org/apache/hadoop/hive/hbase/SampleHBaseKeyFactory3.java testutils/target jdbc/target metastore/target itests/target
[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8836: --- Attachment: HIVE-8836.7-spark.patch Enable automatic tests with remote spark client [Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.7-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226730#comment-14226730 ] Brock Noland commented on HIVE-8836: Attached patch has regenerated output for queries which had a different plan (number of reducers). It does not update the following: *Query Result differences* {noformat} auto_join_without_localtask.q.out count.q.out join_filters_overlap.q.out limit_pushdown.q.out mapreduce2.q.out multi_insert_gby3.q.out multi_join_union.q.out ppd_outer_join3.q.out ptf_decimal.q.out ptf_general_queries.q.out smb_mapjoin_1.q.out smb_mapjoin_2.q.out smb_mapjoin_4.q.out smb_mapjoin_5.q.out smb_mapjoin_8.q.out stats_counter.q.out table_access_keys_stats.q.out uniquejoin.q.out vector_decimal_aggregate.q.out vectorization_13.q.out join_reorder.q.out outer_join_ppr.q.out *Failed* {noformat} bucketmapjoin1.q.out groupby_multi_insert_common_distinct.q.out groupby_multi_single_reducer.q.out infer_bucket_sort_convert_join.q.out mapjoin_hook.q.out smb_mapjoin9 {noformat} Enable automatic tests with remote spark client [Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.7-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226730#comment-14226730 ] Brock Noland edited comment on HIVE-8836 at 11/26/14 7:54 PM: -- Attached patch has regenerated output for queries which had a different plan (number of reducers). It does not update the following: *Query Result differences* {noformat} auto_join_without_localtask.q.out count.q.out join_filters_overlap.q.out limit_pushdown.q.out mapreduce2.q.out multi_insert_gby3.q.out multi_join_union.q.out ppd_outer_join3.q.out ptf_decimal.q.out ptf_general_queries.q.out smb_mapjoin_1.q.out smb_mapjoin_2.q.out smb_mapjoin_4.q.out smb_mapjoin_5.q.out smb_mapjoin_8.q.out stats_counter.q.out table_access_keys_stats.q.out uniquejoin.q.out vector_decimal_aggregate.q.out vectorization_13.q.out join_reorder.q.out outer_join_ppr.q.out {noformat} *Failed* {noformat} bucketmapjoin1.q.out groupby_multi_insert_common_distinct.q.out groupby_multi_single_reducer.q.out infer_bucket_sort_convert_join.q.out mapjoin_hook.q.out smb_mapjoin9 {noformat} was (Author: brocknoland): Attached patch has regenerated output for queries which had a different plan (number of reducers). It does not update the following: *Query Result differences* {noformat} auto_join_without_localtask.q.out count.q.out join_filters_overlap.q.out limit_pushdown.q.out mapreduce2.q.out multi_insert_gby3.q.out multi_join_union.q.out ppd_outer_join3.q.out ptf_decimal.q.out ptf_general_queries.q.out smb_mapjoin_1.q.out smb_mapjoin_2.q.out smb_mapjoin_4.q.out smb_mapjoin_5.q.out smb_mapjoin_8.q.out stats_counter.q.out table_access_keys_stats.q.out uniquejoin.q.out vector_decimal_aggregate.q.out vectorization_13.q.out join_reorder.q.out outer_join_ppr.q.out *Failed* {noformat} bucketmapjoin1.q.out groupby_multi_insert_common_distinct.q.out groupby_multi_single_reducer.q.out infer_bucket_sort_convert_join.q.out mapjoin_hook.q.out smb_mapjoin9 {noformat} Enable automatic tests with remote spark client [Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.7-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8828: --- Status: Patch Available (was: Open) Remove hadoop 20 shims -- Key: HIVE-8828 URL: https://issues.apache.org/jira/browse/HIVE-8828 Project: Hive Issue Type: Task Components: Shims Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, HIVE-8828.patch CLEAR LIBRARY CACHE See : [mailing list discussion | http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8828: --- Status: Open (was: Patch Available) Remove hadoop 20 shims -- Key: HIVE-8828 URL: https://issues.apache.org/jira/browse/HIVE-8828 Project: Hive Issue Type: Task Components: Shims Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, HIVE-8828.patch CLEAR LIBRARY CACHE See : [mailing list discussion | http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-8828: --- Attachment: HIVE-8828.11.patch Another rebase after HIVE-8971 Remove hadoop 20 shims -- Key: HIVE-8828 URL: https://issues.apache.org/jira/browse/HIVE-8828 Project: Hive Issue Type: Task Components: Shims Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, HIVE-8828.patch CLEAR LIBRARY CACHE See : [mailing list discussion | http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226742#comment-14226742 ] Brock Noland commented on HIVE-8828: +1 If the tests look ok we should commit this immediately. Remove hadoop 20 shims -- Key: HIVE-8828 URL: https://issues.apache.org/jira/browse/HIVE-8828 Project: Hive Issue Type: Task Components: Shims Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, HIVE-8828.patch CLEAR LIBRARY CACHE See : [mailing list discussion | http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated HIVE-6361: -- Attachment: HIVE-6361.4.patch Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde Assignee: Julian Hyde Attachments: HIVE-6361.2.patch, HIVE-6361.3.patch, HIVE-6361.4.patch, HIVE-6361.patch I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated HIVE-6361: -- Affects Version/s: (was: 0.12.0) 0.14.0 Status: Patch Available (was: Open) Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.14.0 Reporter: Julian Hyde Assignee: Julian Hyde Attachments: HIVE-6361.2.patch, HIVE-6361.3.patch, HIVE-6361.4.patch, HIVE-6361.patch I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated HIVE-6361: -- Status: Open (was: Patch Available) Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde Assignee: Julian Hyde Attachments: HIVE-6361.2.patch, HIVE-6361.3.patch, HIVE-6361.patch I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226794#comment-14226794 ] Alan Gates commented on HIVE-8966: -- This flush length file should be removed when the batch is closed. Are you closing the transaction batch on a regular basis? Delta files created by hive hcatalog streaming cannot be compacted -- Key: HIVE-8966 URL: https://issues.apache.org/jira/browse/HIVE-8966 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Environment: hive Reporter: Jihong Liu Assignee: Alan Gates Priority: Critical hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where n is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction. Did a test, after removed the bucket_n_flush_length file, then the alter table partition compact finished successfully. If don't delete that file, nothing will be compacted. This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error
[ https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226801#comment-14226801 ] Alan Gates commented on HIVE-8797: -- So are you proposing we change it to {code} if (CheckJDOException.isJDODataStoreException(e) tpart == null) { // Using utility method above, so that JDODataStoreException doesn't // have to be used here. This helps avoid adding jdo dependency for // hcatalog client uses LOG.debug(Caught JDO exception, trying to alter partition instead); tpart = getMSC().getPartitionWithAuthInfo(tbl.getDbName(), tbl.getTableName(), pvals, getUserName(), getGroupNames()); alterPartitionSpec(tbl, partSpec, tpart, inheritTableSpecs, partPath); {code} or {code} if (CheckJDOException.isJDODataStoreException(e)) { // Using utility method above, so that JDODataStoreException doesn't // have to be used here. This helps avoid adding jdo dependency for // hcatalog client uses LOG.debug(Caught JDO exception, trying to alter partition instead); tpart = getMSC().getPartitionWithAuthInfo(tbl.getDbName(), tbl.getTableName(), pvals, getUserName(), getGroupNames()); if (tpart != null) { alterPartitionSpec(tbl, partSpec, tpart, inheritTableSpecs, partPath); } {code} ? Simultaneous dynamic inserts can result in partition already exists error --- Key: HIVE-8797 URL: https://issues.apache.org/jira/browse/HIVE-8797 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8797.2.patch, HIVE-8797.patch If two users attempt a dynamic insert into the same new partition at the same time, a possible race condition exists where both will attempt to create the partition and one will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8721) Enable transactional unit tests against other databases
[ https://issues.apache.org/jira/browse/HIVE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226814#comment-14226814 ] Alan Gates commented on HIVE-8721: -- I'm fine to move this out of TxnDbUtil and put it somewhere more generic. But I'm not sure where. It works in TxnDbUtil because the transaction tests always start by calling TxnDbUtil.prepDb. Is there an equivalent place we can guarantee gets called first in all unit tests? Enable transactional unit tests against other databases --- Key: HIVE-8721 URL: https://issues.apache.org/jira/browse/HIVE-8721 Project: Hive Issue Type: Test Components: Testing Infrastructure, Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8721.patch Since TxnHandler and subclasses use JDBC to directly connect to the underlying database (rather than relying on DataNucleus) it is important to test that all of the operations work against different database flavors. An easy way to do this is to enable the unit tests to run against an external database. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error
[ https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226819#comment-14226819 ] Thejas M Nair commented on HIVE-8797: - I am proposing - {code} if (CheckJDOException.isJDODataStoreException(e)) { // Using utility method above, so that JDODataStoreException doesn't // have to be used here. This helps avoid adding jdo dependency for // hcatalog client uses LOG.debug(Caught JDO exception, will attempt alter partition instead if partition exists now); tpart = getMSC().getPartitionWithAuthInfo(tbl.getDbName(), tbl.getTableName(), pvals, getUserName(), getGroupNames()); if (tpart == null) { // The exception was not caused by partition getting created by another call throw e; } alterPartitionSpec(tbl, partSpec, tpart, inheritTableSpecs, partPath); {code} Simultaneous dynamic inserts can result in partition already exists error --- Key: HIVE-8797 URL: https://issues.apache.org/jira/browse/HIVE-8797 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8797.2.patch, HIVE-8797.patch If two users attempt a dynamic insert into the same new partition at the same time, a possible race condition exists where both will attempt to create the partition and one will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226820#comment-14226820 ] Hive QA commented on HIVE-8836: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683900/HIVE-8836.7-spark.patch {color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 7177 tests executed *Failed tests:* {noformat} TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_count org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_custom_input_output_format org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_insert_common_distinct org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_infer_bucket_sort_convert_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_filters_overlap org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_hook org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapreduce2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_join_union org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_outer_join_ppr org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf_decimal org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf_general_queries org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_4 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_5 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_8 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_table_access_keys_stats org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_1 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_3 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_lazy org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_uniquejoin org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_between_in org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_data_types org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_13 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_14 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_15 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_16 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_9 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_timestamp_funcs {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/449/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/449/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-449/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 48 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683900 - PreCommit-HIVE-SPARK-Build Enable
[jira] [Created] (HIVE-8976) Make nine additional tests deterministic
Brock Noland created HIVE-8976: -- Summary: Make nine additional tests deterministic Key: HIVE-8976 URL: https://issues.apache.org/jira/browse/HIVE-8976 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q
[ https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226865#comment-14226865 ] Prasanth J commented on HIVE-8975: -- [~jcamachorodriguez] I see what the issue here is. That check (RS after GBY) was used to determine map-reduce boundary. The map-side GBY has different stats logic as compared to reduce side GBY. Now after the identity projection removal optimization {code} TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {code} both GBY[2] and GBY[4] are identified as map-side GBY. I think we need to improve that if condition to better differentiate map-side and reduce-side GBY. Somewhat better check would be if RS is contained in upstream operators of GBY then that GBY is reduce side. In the above case GBY[4] contains RS[3] in its upstreams operators. Any thoughts? Possible performance regression on bucket_map_join_tez2.q - Key: HIVE-8975 URL: https://issues.apache.org/jira/browse/HIVE-8975 Project: Hive Issue Type: Bug Components: Logical Optimizer, Statistics Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join. The query is the following one: {noformat} select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key {noformat} The plan before removing the projections is: {noformat} TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} And after removing identity projections: {noformat} TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the optimization does not kick in. The reason for the stats change in the GroupBy operator is in [this line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633], where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8326) Using DbTxnManager with concurrency off results in run time error
[ https://issues.apache.org/jira/browse/HIVE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8326: - Attachment: HIVE-8326.patch This patch changes DbTxnManager to check that concurrency is set to true when it is handed the config file. Using DbTxnManager with concurrency off results in run time error - Key: HIVE-8326 URL: https://issues.apache.org/jira/browse/HIVE-8326 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8326.patch Setting {code} hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.support.concurrency=false {code} results in queries failing at runtime with an NPE in DbTxnManager.heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8326) Using DbTxnManager with concurrency off results in run time error
[ https://issues.apache.org/jira/browse/HIVE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8326: - Status: Patch Available (was: Open) Using DbTxnManager with concurrency off results in run time error - Key: HIVE-8326 URL: https://issues.apache.org/jira/browse/HIVE-8326 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8326.patch Setting {code} hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.support.concurrency=false {code} results in queries failing at runtime with an NPE in DbTxnManager.heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226872#comment-14226872 ] Jihong Liu commented on HIVE-8966: -- Yes. Closed the transaction batch. Suggest to do either the following two updates, or do both: 1. if a file is non-bucket file, don't try to compact it. So update the following code: in org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.java Change the following code: private void addFileToMap(Matcher matcher, Path file, boolean sawBase, MapInteger, BucketTracker splitToBucketMap) { if (!matcher.find()) { LOG.warn(Found a non-bucket file that we thought matched the bucket pattern! + file.toString()); } . to: private void addFileToMap(Matcher matcher, Path file, boolean sawBase, MapInteger, BucketTracker splitToBucketMap) { if (!matcher.find()) { LOG.warn(Found a non-bucket file that we thought matched the bucket pattern! + file.toString()); return; } 2. don't use the bucket file pattern to name to flush_length file. So update the following code: in org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.java change the following code: static Path getSideFile(org.apache.tools.ant.types.Path main) { return new Path(main + _flush_length); } to: static Path getSideFile(org.apache.tools.ant.types.Path main) { if (main.toString().startsWith(bucket_)) { return new Path(bkt+main.toString().substring(6)+ _flush_length); } else return new Path(main + _flush_length); } after did the above updates and re-compiled the hive-exec.jar, the compaction works fine now Delta files created by hive hcatalog streaming cannot be compacted -- Key: HIVE-8966 URL: https://issues.apache.org/jira/browse/HIVE-8966 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Environment: hive Reporter: Jihong Liu Assignee: Alan Gates Priority: Critical hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where n is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction. Did a test, after removed the bucket_n_flush_length file, then the alter table partition compact finished successfully. If don't delete that file, nothing will be compacted. This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8976) Make nine additional tests deterministic
[ https://issues.apache.org/jira/browse/HIVE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8976: --- Attachment: HIVE-8976.patch Make nine additional tests deterministic Key: HIVE-8976 URL: https://issues.apache.org/jira/browse/HIVE-8976 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8976.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8943: Attachment: HIVE-8943.2-spark.patch Giving another try. The refactoring of the big-table calculation algorithm had made it choose different big tables if more than one is available, tweaked the algorithm to choose the same one to minimize the diffs. Fix memory limit check for combine nested mapjoins [Spark Branch] - Key: HIVE-8943 URL: https://issues.apache.org/jira/browse/HIVE-8943 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch Its the opposite problem of what we thought in HIVE-8701. SparkMapJoinOptimizer does combine nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8976) Make nine additional tests deterministic
[ https://issues.apache.org/jira/browse/HIVE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8976: --- Affects Version/s: 0.15.0 Status: Patch Available (was: Open) Make nine additional tests deterministic Key: HIVE-8976 URL: https://issues.apache.org/jira/browse/HIVE-8976 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8976.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8976) Make nine additional tests deterministic
[ https://issues.apache.org/jira/browse/HIVE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8976: --- Description: {noformat} auto_join_without_localtask.q count.q limit_pushdown.q mapreduce2.q multi_insert_gby3.q multi_join_union.q ppd_outer_join3.q ptf_decimal.q ptf_general_queries.q {noformat} Make nine additional tests deterministic Key: HIVE-8976 URL: https://issues.apache.org/jira/browse/HIVE-8976 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8976.patch {noformat} auto_join_without_localtask.q count.q limit_pushdown.q mapreduce2.q multi_insert_gby3.q multi_join_union.q ppd_outer_join3.q ptf_decimal.q ptf_general_queries.q {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226883#comment-14226883 ] Brock Noland commented on HIVE-8836: I am making the following tests deterministic over in HIVE-8976. {noformat} auto_join_without_localtask.q count.q limit_pushdown.q mapreduce2.q multi_insert_gby3.q multi_join_union.q ppd_outer_join3.q ptf_decimal.q ptf_general_queries.q {noformat} Enable automatic tests with remote spark client [Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.7-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226890#comment-14226890 ] Alan Gates commented on HIVE-8966: -- 1 might be the right thing to do. 2 breaks backward compatibility. Before we do that though I'd like to understand why you still see the flush length files hanging around. In my tests I don't see this issue because the flush length file is properly cleaned up. I want to make sure that its existence doesn't mean something else is wrong. Do you see the flush length files in all delta directories or only the most recent? Delta files created by hive hcatalog streaming cannot be compacted -- Key: HIVE-8966 URL: https://issues.apache.org/jira/browse/HIVE-8966 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Environment: hive Reporter: Jihong Liu Assignee: Alan Gates Priority: Critical hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where n is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction. Did a test, after removed the bucket_n_flush_length file, then the alter table partition compact finished successfully. If don't delete that file, nothing will be compacted. This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7896) orcfiledump should be able to dump data
[ https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226891#comment-14226891 ] Hive QA commented on HIVE-7896: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683890/HIVE-7896.2.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6684 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-vectorization_16.q-mapjoin_mapjoin.q-groupby2.q-and-12-more - did not produce a TEST-*.xml file TestParquetDirect - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1914/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1914/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1914/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683890 - PreCommit-HIVE-TRUNK-Build orcfiledump should be able to dump data --- Key: HIVE-7896 URL: https://issues.apache.org/jira/browse/HIVE-7896 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, alltypes2.txt The FileDumper utility in orc, exposed as a service as orcfiledump, can print out metadata from Orc files but not the actual data. Being able to dump the data is also useful in some debugging contexts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-8977) TestParquetDirect should be abstract
Brock Noland created HIVE-8977: -- Summary: TestParquetDirect should be abstract Key: HIVE-8977 URL: https://issues.apache.org/jira/browse/HIVE-8977 Project: Hive Issue Type: Improvement Reporter: Brock Noland Priority: Minor The class {{TestParquetDirect}} does not contain any tests but starts with Test. Thus the build system runs it and expects an output file. We should rename the file to {{AbstractTestParquetDirect}} and make the class abstract. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7896) orcfiledump should be able to dump data
[ https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226898#comment-14226898 ] Prasanth J commented on HIVE-7896: -- LGTM, +1 orcfiledump should be able to dump data --- Key: HIVE-7896 URL: https://issues.apache.org/jira/browse/HIVE-7896 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, alltypes2.txt The FileDumper utility in orc, exposed as a service as orcfiledump, can print out metadata from Orc files but not the actual data. Being able to dump the data is also useful in some debugging contexts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7896) orcfiledump should be able to dump data
[ https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226899#comment-14226899 ] Brock Noland commented on HIVE-7896: I'll handle the parquet test in HIVE-8977. orcfiledump should be able to dump data --- Key: HIVE-7896 URL: https://issues.apache.org/jira/browse/HIVE-7896 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, alltypes2.txt The FileDumper utility in orc, exposed as a service as orcfiledump, can print out metadata from Orc files but not the actual data. Being able to dump the data is also useful in some debugging contexts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8976) Make nine additional tests deterministic
[ https://issues.apache.org/jira/browse/HIVE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8976: --- Attachment: HIVE-8976.patch Including tez tests Make nine additional tests deterministic Key: HIVE-8976 URL: https://issues.apache.org/jira/browse/HIVE-8976 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-8976.patch, HIVE-8976.patch {noformat} auto_join_without_localtask.q count.q limit_pushdown.q mapreduce2.q multi_insert_gby3.q multi_join_union.q ppd_outer_join3.q ptf_decimal.q ptf_general_queries.q {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8977) TestParquetDirect should be abstract
[ https://issues.apache.org/jira/browse/HIVE-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8977: --- Assignee: Brock Noland Affects Version/s: 0.15.0 Status: Patch Available (was: Open) TestParquetDirect should be abstract Key: HIVE-8977 URL: https://issues.apache.org/jira/browse/HIVE-8977 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-8977.patch The class {{TestParquetDirect}} does not contain any tests but starts with Test. Thus the build system runs it and expects an output file. We should rename the file to {{AbstractTestParquetDirect}} and make the class abstract. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8977) TestParquetDirect should be abstract
[ https://issues.apache.org/jira/browse/HIVE-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8977: --- Attachment: HIVE-8977.patch TestParquetDirect should be abstract Key: HIVE-8977 URL: https://issues.apache.org/jira/browse/HIVE-8977 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Brock Noland Priority: Minor Attachments: HIVE-8977.patch The class {{TestParquetDirect}} does not contain any tests but starts with Test. Thus the build system runs it and expects an output file. We should rename the file to {{AbstractTestParquetDirect}} and make the class abstract. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error
[ https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8797: - Status: Open (was: Patch Available) Makes sense. I'll put up a new patch. Simultaneous dynamic inserts can result in partition already exists error --- Key: HIVE-8797 URL: https://issues.apache.org/jira/browse/HIVE-8797 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8797.2.patch, HIVE-8797.patch If two users attempt a dynamic insert into the same new partition at the same time, a possible race condition exists where both will attempt to create the partition and one will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8836: --- Attachment: HIVE-8836.8-spark.patch Latest patch incorporates my changes in HIVE-8976. We'll commit HIVE-8976 to trunk and the same changes in that patch in this JIRA to spark. Enable automatic tests with remote spark client [Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.7-spark.patch, HIVE-8836.8-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226925#comment-14226925 ] Jihong Liu commented on HIVE-8966: -- That flush_length file is only in the most recent delta. By the way, for streaming loading, a transaction batch is probably always open since data keeps coming. Is it possible to do compaction in the streaming loading environment? Thanks Delta files created by hive hcatalog streaming cannot be compacted -- Key: HIVE-8966 URL: https://issues.apache.org/jira/browse/HIVE-8966 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Environment: hive Reporter: Jihong Liu Assignee: Alan Gates Priority: Critical hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where n is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction. Did a test, after removed the bucket_n_flush_length file, then the alter table partition compact finished successfully. If don't delete that file, nothing will be compacted. This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error
[ https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8797: - Status: Patch Available (was: Open) Simultaneous dynamic inserts can result in partition already exists error --- Key: HIVE-8797 URL: https://issues.apache.org/jira/browse/HIVE-8797 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8797.2.patch, HIVE-8797.3.patch, HIVE-8797.patch If two users attempt a dynamic insert into the same new partition at the same time, a possible race condition exists where both will attempt to create the partition and one will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error
[ https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-8797: - Attachment: HIVE-8797.3.patch Simultaneous dynamic inserts can result in partition already exists error --- Key: HIVE-8797 URL: https://issues.apache.org/jira/browse/HIVE-8797 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8797.2.patch, HIVE-8797.3.patch, HIVE-8797.patch If two users attempt a dynamic insert into the same new partition at the same time, a possible race condition exists where both will attempt to create the partition and one will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-8836: --- Attachment: HIVE-8836.9-spark.patch v9 of the patch does *not* clear the environment before starting spark-submit as this was causing issues on various machines finding java. Enable automatic tests with remote spark client [Spark Branch] -- Key: HIVE-8836 URL: https://issues.apache.org/jira/browse/HIVE-8836 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chengxiang Li Assignee: Rui Li Labels: Spark-M3 Fix For: spark-branch Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.7-spark.patch, HIVE-8836.8-spark.patch, HIVE-8836.9-spark.patch In real production environment, remote spark client should be used to submit spark job for Hive mostly, we should enable automatic test with remote spark client to make sure the Hive feature workable with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted
[ https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226943#comment-14226943 ] Alan Gates commented on HIVE-8966: -- Ok, that makes sense. You're current delta has the file because it's still open and being written to. It also explains why my tests don't see it, as they don't run long enough. The streaming is always done by the time the compactor kicks in. Why don't you post a patch to this JIRA with the change for 1, and I can get that committed. [~hagleitn], I'd like to put this in 0.14.1 as well as trunk if you're ok with it, since it blocks compaction for users using the streaming interface. Delta files created by hive hcatalog streaming cannot be compacted -- Key: HIVE-8966 URL: https://issues.apache.org/jira/browse/HIVE-8966 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Environment: hive Reporter: Jihong Liu Assignee: Alan Gates Priority: Critical hive hcatalog streaming will also create a file like bucket_n_flush_length in each delta directory. Where n is the bucket number. But the compactor.CompactorMR think this file also needs to compact. However this file of course cannot be compacted, so compactor.CompactorMR will not continue to do the compaction. Did a test, after removed the bucket_n_flush_length file, then the alter table partition compact finished successfully. If don't delete that file, nothing will be compacted. This is probably a very severity bug. Both 0.13 and 0.14 have this issue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q
[ https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226955#comment-14226955 ] Ashutosh Chauhan commented on HIVE-8975: [~prasanth_j] Instead of trying to determine whether its running in map or reduce, I think stats logic should really make different stats calculation based on mode GBY is running in. That mode can be determined via GBYDesc.Mode All we want is an estimate of # of rows coming out of GBY and that is dependent on whether it is a partial aggregation or full aggregation, not whether its in map or reduce. Thoughts ? Possible performance regression on bucket_map_join_tez2.q - Key: HIVE-8975 URL: https://issues.apache.org/jira/browse/HIVE-8975 Project: Hive Issue Type: Bug Components: Logical Optimizer, Statistics Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join. The query is the following one: {noformat} select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key {noformat} The plan before removing the projections is: {noformat} TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} And after removing identity projections: {noformat} TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the optimization does not kick in. The reason for the stats change in the GroupBy operator is in [this line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633], where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8977) TestParquetDirect should be abstract
[ https://issues.apache.org/jira/browse/HIVE-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226977#comment-14226977 ] Szehon Ho commented on HIVE-8977: - +1 pending tests TestParquetDirect should be abstract Key: HIVE-8977 URL: https://issues.apache.org/jira/browse/HIVE-8977 Project: Hive Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Brock Noland Assignee: Brock Noland Priority: Minor Attachments: HIVE-8977.patch The class {{TestParquetDirect}} does not contain any tests but starts with Test. Thus the build system runs it and expects an output file. We should rename the file to {{AbstractTestParquetDirect}} and make the class abstract. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8924: Attachment: HIVE-8924.3-spark.patch This changed diff of some tests including those with empty stages. And join_view has a larger diff. Not sure why, but the plan looks similar and the results are still the same. Investigate test failure for join_empty.q [Spark Branch] Key: HIVE-8924 URL: https://issues.apache.org/jira/browse/HIVE-8924 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch, HIVE-8924.3-spark.patch This query has an interesting case where the big table work is empty. Here's the MR plan: {noformat} STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: b Fetch Operator limit: -1 Alias - Map Local Operator Tree: b TableScan alias: b Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} keys: 0 UDFToDouble(key) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226999#comment-14226999 ] Szehon Ho commented on HIVE-8924: - Correction: join_view looks ok, but optimize_nullscan has a larger diff. Investigate test failure for join_empty.q [Spark Branch] Key: HIVE-8924 URL: https://issues.apache.org/jira/browse/HIVE-8924 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Szehon Ho Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch, HIVE-8924.3-spark.patch This query has an interesting case where the big table work is empty. Here's the MR plan: {noformat} STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: b Fetch Operator limit: -1 Alias - Map Local Operator Tree: b TableScan alias: b Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToDouble(key) is not null (type: boolean) Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} keys: 0 UDFToDouble(key) (type: double) 1 UDFToDouble(key) (type: double) Stage: Stage-3 Map Reduce Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {noformat} The plan for Spark is not correct. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error
[ https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227003#comment-14227003 ] Thejas M Nair commented on HIVE-8797: - +1 Simultaneous dynamic inserts can result in partition already exists error --- Key: HIVE-8797 URL: https://issues.apache.org/jira/browse/HIVE-8797 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-8797.2.patch, HIVE-8797.3.patch, HIVE-8797.patch If two users attempt a dynamic insert into the same new partition at the same time, a possible race condition exists where both will attempt to create the partition and one will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227004#comment-14227004 ] Hive QA commented on HIVE-8943: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683931/HIVE-8943.2-spark.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7180 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/450/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/450/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-450/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683931 - PreCommit-HIVE-SPARK-Build Fix memory limit check for combine nested mapjoins [Spark Branch] - Key: HIVE-8943 URL: https://issues.apache.org/jira/browse/HIVE-8943 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch Its the opposite problem of what we thought in HIVE-8701. SparkMapJoinOptimizer does combine nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q
[ https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227005#comment-14227005 ] Prasanth J commented on HIVE-8975: -- [~ashutoshc] What are all the possible modes for map-side and reduce-side? Stats calculation also has some logic for hash-aggregation enabled vs disabled. Is it safe to assume that if mode is HASH/PARTIAL it is map-side? And if the mode is FULL then reduce-side? If so I can change the logic accordingly without depending on the child/parent checks in operator tree. Possible performance regression on bucket_map_join_tez2.q - Key: HIVE-8975 URL: https://issues.apache.org/jira/browse/HIVE-8975 Project: Hive Issue Type: Bug Components: Logical Optimizer, Statistics Affects Versions: 0.15.0 Reporter: Jesus Camacho Rodriguez After introducing the identity project removal optimization in HIVE-8435, plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In particular, earlier it was doing a map-join and after HIVE-8435 it changed to a reduce-join. The query is the following one: {noformat} select a.key, b.key from (select distinct key from tab) a join tab b on b.key = a.key {noformat} The plan before removing the projections is: {noformat} TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} And after removing identity projections: {noformat} TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13] TS[6]-FIL[17]-RS[10]-JOIN[11] {noformat} After digging a bit, I realized it is not converting the reduce-join into a map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the optimization does not kick in. The reason for the stats change in the GroupBy operator is in [this line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633], where it is checked whether the GBY is immediately followed by a RS operator or not, and calculate stats differently depending on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-8934: --- Attachment: HIVE-8934.3-spark.patch Regenerated test outputs. Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch] -- Key: HIVE-8934 URL: https://issues.apache.org/jira/browse/HIVE-8934 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Chao Assignee: Chao Attachments: HIVE-8934.1-spark.patch, HIVE-8934.2-spark.patch, HIVE-8934.3-spark.patch With MapJoin enabled, these two tests will generate incorrect results. This seem to be related to the HiveInputFormat that these two are using. We need to investigate the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8828) Remove hadoop 20 shims
[ https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227028#comment-14227028 ] Hive QA commented on HIVE-8828: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12683906/HIVE-8828.11.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6680 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-vectorization_16.q-mapjoin_mapjoin.q-groupby2.q-and-12-more - did not produce a TEST-*.xml file TestParquetDirect - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1915/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1915/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1915/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12683906 - PreCommit-HIVE-TRUNK-Build Remove hadoop 20 shims -- Key: HIVE-8828 URL: https://issues.apache.org/jira/browse/HIVE-8828 Project: Hive Issue Type: Task Components: Shims Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, HIVE-8828.patch CLEAR LIBRARY CACHE See : [mailing list discussion | http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-8943: Attachment: HIVE-8943.3-spark.patch Fix memory limit check for combine nested mapjoins [Spark Branch] - Key: HIVE-8943 URL: https://issues.apache.org/jira/browse/HIVE-8943 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch Its the opposite problem of what we thought in HIVE-8701. SparkMapJoinOptimizer does combine nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227032#comment-14227032 ] Szehon Ho commented on HIVE-8943: - Forgot to generate the golden files for new tests in CLIDriver. Fix memory limit check for combine nested mapjoins [Spark Branch] - Key: HIVE-8943 URL: https://issues.apache.org/jira/browse/HIVE-8943 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch Its the opposite problem of what we thought in HIVE-8701. SparkMapJoinOptimizer does combine nested mapjoins into one work due to removal of RS for big-table. So we need to enhance the check to calculate if all the MapJoins in that work (spark-stage) will fit into the memory, otherwise it might overwhelm memory for that particular spark executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8964) Some TestMiniTezCliDriver tests taking two hours
[ https://issues.apache.org/jira/browse/HIVE-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227040#comment-14227040 ] Brock Noland commented on HIVE-8964: I am pretty sure this is {{lvj_mapjoin.q}} which was added in HIVE-. I've excluded that test on the PTest side. We'll see if that helps. Some TestMiniTezCliDriver tests taking two hours Key: HIVE-8964 URL: https://issues.apache.org/jira/browse/HIVE-8964 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Gunther Hagleitner Priority: Blocker The test {{TestMiniTezCliDriver}} with the following query files: vectorization_16.q,mapjoin_mapjoin.q,groupby2.q,lvj_mapjoin.q,vectorization_5.q,vectorization_pushdown.q,orc_merge_incompat1.q,cbo_gby.q,vectorization_4.q,auto_join0.q,cross_product_check_1.q,vectorization_not.q,update_where_no_match.q,ctas.q,cbo_udf_udaf.q is timing out after two hours severely delaying the Hive precommits http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1898/failed/TestMiniTezCliDriver-vectorization_16.q-mapjoin_mapjoin.q-groupby2.q-and-12-more/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)