[jira] [Commented] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225898#comment-14225898
 ] 

Hive QA commented on HIVE-8943:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683740/HIVE-8943.1-spark.patch

{color:red}ERROR:{color} -1 due to 53 failed/errored test(s), 7179 tests 
executed
*Failed tests:*
{noformat}
TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file
TestGenericUDFOPNumeric - did not produce a TEST-*.xml file
TestHBaseKeyFactory - did not produce a TEST-*.xml file
TestHBaseKeyFactory2 - did not produce a TEST-*.xml file
TestHBaseKeyFactory3 - did not produce a TEST-*.xml file
TestHBasePredicateDecomposer - did not produce a TEST-*.xml file
TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file
TestTezSessionState - did not produce a TEST-*.xml file
TestURLHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_column_access_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_hive_626
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_view
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_subquery2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mergejoins
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mergejoins_mixed
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_join5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin_union_remove_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoin_union_remove_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt6
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_skewjoinopt9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin9
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/442/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/442/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-442/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 53 tests failed
{noformat}

This message is automatically 

[jira] [Commented] (HIVE-8848) data loading from text files or text file processing doesn't handle nulls correctly

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225921#comment-14225921
 ] 

Hive QA commented on HIVE-8848:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683492/HIVE-8848.3.patch.txt

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6683 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithAvroExternalSchema
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithAvroSerClass
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseAvroColumnFamily
org.apache.hadoop.hive.serde2.lazy.TestLazyArrayMapStruct.testLazyMapWithBadEntries
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1905/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1905/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1905/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683492 - PreCommit-HIVE-TRUNK-Build

 data loading from text files or text file processing doesn't handle nulls 
 correctly
 ---

 Key: HIVE-8848
 URL: https://issues.apache.org/jira/browse/HIVE-8848
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-8848.01.patch, HIVE-8848.2.patch.txt, 
 HIVE-8848.3.patch.txt, HIVE-8848.patch


 I am not sure how nulls are supposed to be stored in text tables, but after 
 loading some data with null or NULL strings, or x00 characters, we get 
 bunch of annoying logging from LazyPrimitive that data is not in INT format 
 and was converted to null, with data being null (string saying null, I 
 assume from the code).
 Either load should load them as nulls, or there should be some defined way to 
 load nulls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225922#comment-14225922
 ] 

Hive QA commented on HIVE-8828:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683566/HIVE-8828.9.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1906/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1906/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1906/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1906/source-prep.txt
+ [[ true == \t\r\u\e ]]
+ rm -rf ivy maven
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java'
Reverted 
'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java'
Reverted 'hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java'
Reverted 
'hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java'
Reverted 'data/files/cbo_t2.txt'
Reverted 'data/files/cbo_t4.txt'
Reverted 'data/files/cbo_t6.txt'
Reverted 'data/files/cbo_t1.txt'
Reverted 'data/files/cbo_t3.txt'
Reverted 'data/files/cbo_t5.txt'
Reverted 
'accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/serde/FirstCharAccumuloCompositeRowId.java'
Reverted 
'accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/LazyAccumuloRow.java'
Reverted 
'accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/LazyAccumuloMap.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyString.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyMap.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyArray.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyNonPrimitive.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBinary.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyStruct.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUnion.java'
Reverted 
'serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java'
Reverted 'ql/src/test/results/clientpositive/cbo_windowing.q.out'
Reverted 'ql/src/test/results/clientpositive/cbo_udf_udaf.q.out'
Reverted 'ql/src/test/results/clientpositive/cbo_limit.q.out'
Reverted 'ql/src/test/results/clientpositive/cbo_gby.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/cbo_union.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/cbo_windowing.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/cbo_join.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/cbo_gby.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/cbo_limit.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/cbo_views.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/cbo_simple_select.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/cbo_udf_udaf.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/cbo_semijoin.q.out'
Reverted 'ql/src/test/results/clientpositive/cbo_union.q.out'
Reverted 'ql/src/test/results/clientpositive/cbo_simple_select.q.out'
Reverted 'ql/src/test/results/clientpositive/cbo_semijoin.q.out'
Reverted 

[jira] [Commented] (HIVE-8953) 0.5.2-SNAPSHOT Dependency

2014-11-26 Thread Olaf Flebbe (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225937#comment-14225937
 ] 

Olaf Flebbe commented on HIVE-8953:
---

Seems to be fixed in 0.14 branch.

Any plans to release a 0.14.1 an urgent bug-fix release?

 0.5.2-SNAPSHOT Dependency
 -

 Key: HIVE-8953
 URL: https://issues.apache.org/jira/browse/HIVE-8953
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
 Environment: Compiling for Apache BIGTOP. 
Reporter: Olaf Flebbe

 I have the issue that hive shim 0.23 needs tez Version 0.5.2-SNAPSHOT.
 Hm, I have no clue what SNAPSHOT of Apache Tez should be used. There is no 
 0.5.2-SNAPSHOT in Maven Central Repository.
 Can I use 0.5.2 ? (This seems to be released)
 This relates to:  [HIVE-8614] I have the same problem as the reporter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225953#comment-14225953
 ] 

Hive QA commented on HIVE-8934:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683746/HIVE-8934.1-spark.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 7179 tests 
executed
*Failed tests:*
{noformat}
TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file
TestGenericUDFOPNumeric - did not produce a TEST-*.xml file
TestHBaseKeyFactory - did not produce a TEST-*.xml file
TestHBaseKeyFactory2 - did not produce a TEST-*.xml file
TestHBaseKeyFactory3 - did not produce a TEST-*.xml file
TestHBasePredicateDecomposer - did not produce a TEST-*.xml file
TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file
TestTezSessionState - did not produce a TEST-*.xml file
TestURLHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/443/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/443/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-443/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683746 - PreCommit-HIVE-SPARK-Build

 Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark 
 Branch]
 --

 Key: HIVE-8934
 URL: https://issues.apache.org/jira/browse/HIVE-8934
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8934.1-spark.patch


 With MapJoin enabled, these two tests will generate incorrect results.
 This seem to be related to the HiveInputFormat that these two are using.
 We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6914) parquet-hive cannot write nested map (map value is map)

2014-11-26 Thread Mickael Lacour (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225996#comment-14225996
 ] 

Mickael Lacour commented on HIVE-6914:
--

Thx 

 parquet-hive cannot write nested map (map value is map)
 ---

 Key: HIVE-6914
 URL: https://issues.apache.org/jira/browse/HIVE-6914
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0, 0.13.0
Reporter: Tongjie Chen
Assignee: Ryan Blue
  Labels: parquet, serialization
 Fix For: 0.15.0

 Attachments: HIVE-6914.1.patch, HIVE-6914.1.patch, HIVE-6914.2.patch, 
 HIVE-6914.3.patch, HIVE-6914.4.patch, NestedMap.parquet


 // table schema (identical for both plain text version and parquet version)
 desc hive desc text_mmap;
 m map
 // sample nested map entry
 {level1:{level2_key1:value1,level2_key2:value2}}
 The following query will fail, 
 insert overwrite table parquet_mmap select * from text_mmap;
 Caused by: parquet.io.ParquetEncodingException: This should be an 
 ArrayWritable or MapWritable: 
 org.apache.hadoop.hive.ql.io.parquet.writable.BinaryWritable@f2f8106
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:85)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:118)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:80)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeData(DataWritableWriter.java:82)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:55)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
 at 
 parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:115)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81)
 at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:77)
 at 
 org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:90)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
 ... 9 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226009#comment-14226009
 ] 

Hive QA commented on HIVE-8924:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683752/HIVE-8924-spark.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 7179 tests 
executed
*Failed tests:*
{noformat}
TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file
TestGenericUDFOPNumeric - did not produce a TEST-*.xml file
TestHBaseKeyFactory - did not produce a TEST-*.xml file
TestHBaseKeyFactory2 - did not produce a TEST-*.xml file
TestHBaseKeyFactory3 - did not produce a TEST-*.xml file
TestHBasePredicateDecomposer - did not produce a TEST-*.xml file
TestTezSessionState - did not produce a TEST-*.xml file
TestURLHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_view
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_mapjoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin9
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/444/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/444/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-444/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683752 - PreCommit-HIVE-SPARK-Build

 Investigate test failure for join_empty.q [Spark Branch]
 

 Key: HIVE-8924
 URL: https://issues.apache.org/jira/browse/HIVE-8924
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho
 Attachments: HIVE-8924-spark.patch


 This query has an interesting case where the big table work is empty. Here's 
 the MR plan:
 {noformat}
 STAGE DEPENDENCIES:
   Stage-4 is a root stage
   Stage-3 depends on stages: Stage-4
   Stage-0 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 b 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 b 
   TableScan
 alias: b
 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: UDFToDouble(key) is not null (type: boolean)
   Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 condition expressions:
   0 {key}
   1 {value}
 keys:
   0 UDFToDouble(key) (type: double)
   1 UDFToDouble(key) (type: double)
   Stage: Stage-3
 Map Reduce
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 ListSink
 {noformat}
 The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226010#comment-14226010
 ] 

Hive QA commented on HIVE-8970:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683781/HIVE-8970.1-spark.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/445/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/445/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-445/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-445/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-spark-source ]]
+ [[ ! -d apache-svn-spark-source/.svn ]]
+ [[ ! -d apache-svn-spark-source ]]
+ cd apache-svn-spark-source
+ svn revert -R .
Reverted 'itests/src/test/resources/testconfiguration.properties'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java'
++ svn status --no-ignore
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target shims/scheduler/target 
packaging/target hbase-handler/target testutils/target jdbc/target 
metastore/target itests/target itests/hcatalog-unit/target 
itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target 
itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target itests/qtest-spark/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
accumulo-handler/target hwi/target common/target common/src/gen 
spark-client/target service/target contrib/target serde/target beeline/target 
cli/target odbc/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1641793.

At revision 1641793.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683781 - PreCommit-HIVE-SPARK-Build

 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8970.1-spark.patch


 Right now, in Spark branch we enable MJ without looking at this 
 configuration. The related code in {{SparkMapJoinOptimizer}} is commented 
 out. We should only enable MJ when the flag is true.



--
This message 

[jira] [Commented] (HIVE-8962) Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226050#comment-14226050
 ] 

Hive QA commented on HIVE-8962:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683651/HIVE-8962.patch

{color:green}SUCCESS:{color} +1 6683 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1907/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1907/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1907/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683651 - PreCommit-HIVE-TRUNK-Build

 Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2
 

 Key: HIVE-8962
 URL: https://issues.apache.org/jira/browse/HIVE-8962
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chao
Assignee: Chao
Priority: Minor
 Attachments: HIVE-8962.patch


 Similar to HIVE-8936, we need to add {{SORT_QUERY_RESULTS}} to the following 
 q-files:
 {noformat}
 ppd_multi_insert.q
 ptf_streaming.q
 subquery_exists.q
 subquery_multiinsert.q
 vectorized_ptf.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-11-26 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226114#comment-14226114
 ] 

Chao commented on HIVE-8970:


This patch applies cleanly on my machine, not sure why it failed.

 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8970.1-spark.patch


 Right now, in Spark branch we enable MJ without looking at this 
 configuration. The related code in {{SparkMapJoinOptimizer}} is commented 
 out. We should only enable MJ when the flag is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-11-26 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8970:
---
Attachment: HIVE-8970.2-spark.patch

Re-attach the same patch to trigger test run.

 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-8970.1-spark.patch, HIVE-8970.2-spark.patch


 Right now, in Spark branch we enable MJ without looking at this 
 configuration. The related code in {{SparkMapJoinOptimizer}} is commented 
 out. We should only enable MJ when the flag is true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]

2014-11-26 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8934:
---
Attachment: HIVE-8934.2-spark.patch

Re-attach the same patch to trigger test run.

 Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark 
 Branch]
 --

 Key: HIVE-8934
 URL: https://issues.apache.org/jira/browse/HIVE-8934
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8934.1-spark.patch, HIVE-8934.2-spark.patch


 With MapJoin enabled, these two tests will generate incorrect results.
 This seem to be related to the HiveInputFormat that these two are using.
 We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]

2014-11-26 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8924:
---
Attachment: HIVE-8924.2-spark.patch

Re-attach the same patch to trigger test run.

 Investigate test failure for join_empty.q [Spark Branch]
 

 Key: HIVE-8924
 URL: https://issues.apache.org/jira/browse/HIVE-8924
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho
 Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch


 This query has an interesting case where the big table work is empty. Here's 
 the MR plan:
 {noformat}
 STAGE DEPENDENCIES:
   Stage-4 is a root stage
   Stage-3 depends on stages: Stage-4
   Stage-0 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 b 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 b 
   TableScan
 alias: b
 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: UDFToDouble(key) is not null (type: boolean)
   Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 condition expressions:
   0 {key}
   1 {value}
 keys:
   0 UDFToDouble(key) (type: double)
   1 UDFToDouble(key) (type: double)
   Stage: Stage-3
 Map Reduce
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 ListSink
 {noformat}
 The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226191#comment-14226191
 ] 

Hive QA commented on HIVE-8934:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683832/HIVE-8934.2-spark.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 7180 tests 
executed
*Failed tests:*
{noformat}
TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file
TestGenericUDFOPNumeric - did not produce a TEST-*.xml file
TestHBaseKeyFactory - did not produce a TEST-*.xml file
TestHBaseKeyFactory2 - did not produce a TEST-*.xml file
TestHBaseKeyFactory3 - did not produce a TEST-*.xml file
TestHBasePredicateDecomposer - did not produce a TEST-*.xml file
TestTezSessionState - did not produce a TEST-*.xml file
TestURLHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_multiinsert
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/446/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/446/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-446/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683832 - PreCommit-HIVE-SPARK-Build

 Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark 
 Branch]
 --

 Key: HIVE-8934
 URL: https://issues.apache.org/jira/browse/HIVE-8934
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8934.1-spark.patch, HIVE-8934.2-spark.patch


 With MapJoin enabled, these two tests will generate incorrect results.
 This seem to be related to the HiveInputFormat that these two are using.
 We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8970) Enable map join optimization only when hive.auto.convert.join is true [Spark Branch]

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226193#comment-14226193
 ] 

Hive QA commented on HIVE-8970:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683831/HIVE-8970.2-spark.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/447/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/447/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-447/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-SPARK-Build-447/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-spark-source ]]
+ [[ ! -d apache-svn-spark-source/.svn ]]
+ [[ ! -d apache-svn-spark-source ]]
+ cd apache-svn-spark-source
+ svn revert -R .
Reverted 'itests/src/test/resources/testconfiguration.properties'
Reverted 'ql/src/test/results/clientpositive/spark/bucketmapjoin10.q.out'
Reverted 'ql/src/test/results/clientpositive/spark/bucketmapjoin11.q.out'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinEagerRowContainer.java'
++ svn status --no-ignore
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target shims/scheduler/target 
packaging/target hbase-handler/target testutils/target jdbc/target 
metastore/target itests/target itests/hcatalog-unit/target 
itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target 
itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target itests/qtest-spark/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
accumulo-handler/target hwi/target common/target common/src/gen 
spark-client/target service/target contrib/target serde/target beeline/target 
cli/target odbc/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1641818.

At revision 1641818.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683831 - PreCommit-HIVE-SPARK-Build

 Enable map join optimization only when hive.auto.convert.join is true [Spark 
 Branch]
 

 Key: HIVE-8970
 URL: https://issues.apache.org/jira/browse/HIVE-8970
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Fix For: spark-branch

 

[jira] [Commented] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226211#comment-14226211
 ] 

Hive QA commented on HIVE-8875:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683677/HIVE-8875.2.patch

{color:green}SUCCESS:{color} +1 6683 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1908/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1908/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1908/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683677 - PreCommit-HIVE-TRUNK-Build

 hive.optimize.sort.dynamic.partition should be turned off for ACID
 --

 Key: HIVE-8875
 URL: https://issues.apache.org/jira/browse/HIVE-8875
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8875.2.patch, HIVE-8875.patch


 Turning this on causes ACID insert, updates, and deletes to produce 
 non-optimal plans with extra reduce phases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]

2014-11-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226270#comment-14226270
 ] 

Xuefu Zhang commented on HIVE-8836:
---

[~ruili], I think the number of reducers changed because of the cluster 
changes. Previously the plan is generated with one node with 4 cores 
(local[4]). Now the cluster has 2 nodes and one core each. Memory configuration 
is also different. I guess it's hard to tweek the cluster configuration so that 
the same number of reducer results.

For now, I think we have to go thru the list and analyze failures one by one. 
It's a long list, and maybe it can be divided among people so that each only 
take a slice of it.

Briefly checking the result, it seems the failures are caused by any of the 
following reasons:
1. reducer number change, which is okay.
2. result diff. It could be a matter of ordering, but could be different result 
also.
3. test failed to run.

I noticed that we are using local-cluster[2,1,2048]. Maybe we should have a 
more general case where one node has more than one core. Also, we may need to 
adjust the memory settings. Once we have a representative of a small cluster, 
we probably will stay with it for some time. 



 Enable automatic tests with remote spark client [Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, 
 HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, 
 HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226272#comment-14226272
 ] 

Hive QA commented on HIVE-8924:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683834/HIVE-8924.2-spark.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 7179 tests 
executed
*Failed tests:*
{noformat}
TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file
TestGenericUDFOPNumeric - did not produce a TEST-*.xml file
TestHBaseKeyFactory - did not produce a TEST-*.xml file
TestHBaseKeyFactory2 - did not produce a TEST-*.xml file
TestHBaseKeyFactory3 - did not produce a TEST-*.xml file
TestHBasePredicateDecomposer - did not produce a TEST-*.xml file
TestTezSessionState - did not produce a TEST-*.xml file
TestURLHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_view
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin9
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/448/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/448/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-448/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683834 - PreCommit-HIVE-SPARK-Build

 Investigate test failure for join_empty.q [Spark Branch]
 

 Key: HIVE-8924
 URL: https://issues.apache.org/jira/browse/HIVE-8924
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho
 Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch


 This query has an interesting case where the big table work is empty. Here's 
 the MR plan:
 {noformat}
 STAGE DEPENDENCIES:
   Stage-4 is a root stage
   Stage-3 depends on stages: Stage-4
   Stage-0 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 b 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 b 
   TableScan
 alias: b
 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: UDFToDouble(key) is not null (type: boolean)
   Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 condition expressions:
   0 {key}
   1 {value}
 keys:
   0 UDFToDouble(key) (type: double)
   1 UDFToDouble(key) (type: double)
   Stage: Stage-3
 Map Reduce
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 ListSink
 {noformat}
 The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]

2014-11-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226277#comment-14226277
 ] 

Xuefu Zhang commented on HIVE-8924:
---

[~csun], I think you will need to regenerate your .out because of HIVE recently 
resolved HIVE-8961. 

 Investigate test failure for join_empty.q [Spark Branch]
 

 Key: HIVE-8924
 URL: https://issues.apache.org/jira/browse/HIVE-8924
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho
 Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch


 This query has an interesting case where the big table work is empty. Here's 
 the MR plan:
 {noformat}
 STAGE DEPENDENCIES:
   Stage-4 is a root stage
   Stage-3 depends on stages: Stage-4
   Stage-0 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 b 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 b 
   TableScan
 alias: b
 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: UDFToDouble(key) is not null (type: boolean)
   Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 condition expressions:
   0 {key}
   1 {value}
 keys:
   0 UDFToDouble(key) (type: double)
   1 UDFToDouble(key) (type: double)
   Stage: Stage-3
 Map Reduce
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 ListSink
 {noformat}
 The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]

2014-11-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226277#comment-14226277
 ] 

Xuefu Zhang edited comment on HIVE-8924 at 11/26/14 3:08 PM:
-

[~szehon], I think you will need to regenerate your .out because of HIVE 
recently resolved HIVE-8961. 


was (Author: xuefuz):
[~csun], I think you will need to regenerate your .out because of HIVE recently 
resolved HIVE-8961. 

 Investigate test failure for join_empty.q [Spark Branch]
 

 Key: HIVE-8924
 URL: https://issues.apache.org/jira/browse/HIVE-8924
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho
 Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch


 This query has an interesting case where the big table work is empty. Here's 
 the MR plan:
 {noformat}
 STAGE DEPENDENCIES:
   Stage-4 is a root stage
   Stage-3 depends on stages: Stage-4
   Stage-0 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 b 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 b 
   TableScan
 alias: b
 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: UDFToDouble(key) is not null (type: boolean)
   Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 condition expressions:
   0 {key}
   1 {value}
 keys:
   0 UDFToDouble(key) (type: double)
   1 UDFToDouble(key) (type: double)
   Stage: Stage-3
 Map Reduce
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 ListSink
 {noformat}
 The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]

2014-11-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226277#comment-14226277
 ] 

Xuefu Zhang edited comment on HIVE-8924 at 11/26/14 3:08 PM:
-

[~szehon], [~csun], I think you will need to regenerate your .out because of 
HIVE recently resolved HIVE-8961. 


was (Author: xuefuz):
[~szehon], I think you will need to regenerate your .out because of HIVE 
recently resolved HIVE-8961. 

 Investigate test failure for join_empty.q [Spark Branch]
 

 Key: HIVE-8924
 URL: https://issues.apache.org/jira/browse/HIVE-8924
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho
 Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch


 This query has an interesting case where the big table work is empty. Here's 
 the MR plan:
 {noformat}
 STAGE DEPENDENCIES:
   Stage-4 is a root stage
   Stage-3 depends on stages: Stage-4
   Stage-0 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 b 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 b 
   TableScan
 alias: b
 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: UDFToDouble(key) is not null (type: boolean)
   Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 condition expressions:
   0 {key}
   1 {value}
 keys:
   0 UDFToDouble(key) (type: double)
   1 UDFToDouble(key) (type: double)
   Stage: Stage-3
 Map Reduce
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 ListSink
 {noformat}
 The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]

2014-11-26 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226283#comment-14226283
 ] 

Xuefu Zhang commented on HIVE-8934:
---

[~csun],  I think you will need to regenerate your .out because of HIVE 
recently resolved HIVE-8961. 

 Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark 
 Branch]
 --

 Key: HIVE-8934
 URL: https://issues.apache.org/jira/browse/HIVE-8934
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8934.1-spark.patch, HIVE-8934.2-spark.patch


 With MapJoin enabled, these two tests will generate incorrect results.
 This seem to be related to the HiveInputFormat that these two are using.
 We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8962) Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2

2014-11-26 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8962:
--
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks to Chao for the contribution.

 Add SORT_QUERY_RESULTS for join tests that do not guarantee order #2
 

 Key: HIVE-8962
 URL: https://issues.apache.org/jira/browse/HIVE-8962
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chao
Assignee: Chao
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-8962.patch


 Similar to HIVE-8936, we need to add {{SORT_QUERY_RESULTS}} to the following 
 q-files:
 {noformat}
 ppd_multi_insert.q
 ptf_streaming.q
 subquery_exists.q
 subquery_multiinsert.q
 vectorized_ptf.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-7329) Create SparkWork [Spark Branch]

2014-11-26 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7329:
--
Comment: was deleted

(was: hi,XueFu. i  builted hive on spark (spark branch on 
https://github.com/apache/hive.git),and spark (master branch on 
https://github.com/apache/spark.git),and my spark assembly jar is 
spark-assembly-1.2.0-SNAPSHOT-hadoop2.4.0.jar,and set this jar path into 
hive-env.sh (set to HIVE_AUX_JARS_PATH),and start hive ,do follow commnad to 
start a query :
set hive.execution.engine=spark;
set spark.master=spark://:7077;
set spark.eventLog.enabled=true; 
set spark.executor.memory=1024m;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
but it seems it still use mr for query engine,then i remote debug it ,and  
found it can't  jump to spark engine.
i do what this 
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
 be told,can u tell me anything wrong?)

 Create SparkWork [Spark Branch]
 ---

 Key: HIVE-7329
 URL: https://issues.apache.org/jira/browse/HIVE-7329
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.13.1
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: spark-branch

 Attachments: HIVE-7329.patch


 This class encapsulates all the work objects that can be executed in a single 
 Spark job.
 NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8828:
---
Status: Open  (was: Patch Available)

 Remove hadoop 20 shims
 --

 Key: HIVE-8828
 URL: https://issues.apache.org/jira/browse/HIVE-8828
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8828.1.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, 
 HIVE-8828.4.patch, HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, 
 HIVE-8828.8.patch, HIVE-8828.9.patch, HIVE-8828.patch


 CLEAR LIBRARY CACHE
 See : [mailing list discussion | 
 http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8828:
---
Status: Patch Available  (was: Open)

 Remove hadoop 20 shims
 --

 Key: HIVE-8828
 URL: https://issues.apache.org/jira/browse/HIVE-8828
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, 
 HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, 
 HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, 
 HIVE-8828.patch


 CLEAR LIBRARY CACHE
 See : [mailing list discussion | 
 http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8828:
---
Attachment: HIVE-8828.10.patch

Rebased to trunk.

 Remove hadoop 20 shims
 --

 Key: HIVE-8828
 URL: https://issues.apache.org/jira/browse/HIVE-8828
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, 
 HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, HIVE-8828.5.patch, 
 HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, HIVE-8828.9.patch, 
 HIVE-8828.patch


 CLEAR LIBRARY CACHE
 See : [mailing list discussion | 
 http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8967) Fix bucketmapjoin7.q determinism

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226445#comment-14226445
 ] 

Hive QA commented on HIVE-8967:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683703/HIVE-8967.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6683 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1909/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1909/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1909/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683703 - PreCommit-HIVE-TRUNK-Build

 Fix bucketmapjoin7.q determinism
 

 Key: HIVE-8967
 URL: https://issues.apache.org/jira/browse/HIVE-8967
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-8967.patch


 In working on HIVE-8963, we found the output is not determistic. We can add 
 order by to make sure the output fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]

2014-11-26 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226461#comment-14226461
 ] 

Brock Noland commented on HIVE-8836:


I will change it to two cores and then re-generate the outputs. This should 
allow us to differentiate between failed tests, changes outputs, and just 
reducer changes.

 Enable automatic tests with remote spark client [Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, 
 HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, 
 HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8956) Hive hangs while some error/exception happens beyond job execution [Spark Branch]

2014-11-26 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226504#comment-14226504
 ] 

Marcelo Vanzin commented on HIVE-8956:
--

I haven't looked at akka in that much detail to see if there is some API to 
catch those. You can enable akka logging (set {{spark.akka.logLifecycleEvents}} 
to true) and that will print these errors to the logs. Spark tries to serialize 
data before sending it to akka, to try to catch serialization issues, but that 
adds overhead, and it also doesn't help in the deserialization path...

 Hive hangs while some error/exception happens beyond job execution [Spark 
 Branch]
 -

 Key: HIVE-8956
 URL: https://issues.apache.org/jira/browse/HIVE-8956
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8956.1-spark.patch


 Remote spark client communicate with remote spark context asynchronously, if 
 error/exception is throw out during job execution in remote spark context, it 
 would be wrapped and send back to remote spark client, but if error/exception 
 is throw out beyond job execution, such as job serialized failed, remote 
 spark client would never know what's going on in remote spark context, and it 
 would hangs there.
 Set a timeout in remote spark client side may not a great idea, as we are not 
 sure how long the query executed in spark cluster. we need find a way to 
 check whether job has failed(whole life cycle) in remote spark context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8957) Remote spark context needs to clean up itself in case of connection timeout [Spark Branch]

2014-11-26 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226515#comment-14226515
 ] 

Marcelo Vanzin commented on HIVE-8957:
--

I think a fix here will be a little more complicated than that. Let me look at 
the code and think about it.

 Remote spark context needs to clean up itself in case of connection timeout 
 [Spark Branch]
 --

 Key: HIVE-8957
 URL: https://issues.apache.org/jira/browse/HIVE-8957
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-8957.1-spark.patch


 In the current SparkClient implementation (class SparkClientImpl), the 
 constructor does some initialization and in the end waits for the remote 
 driver to connect. In case of timeout, it just throws an exception without 
 cleaning itself. The cleanup is necessary to release system resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]

2014-11-26 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226517#comment-14226517
 ] 

Marcelo Vanzin commented on HIVE-8574:
--

Actually, after a quick look at the code again, this might not be a problem. 
Metrics are kept per-job handle. Job handles are managed by the code submitting 
jobs - leave them for garbage collection and metrics go away.

So unless we're worried about a single job creating so many tasks that it will 
run the driver out of memory with all the metrics data, this shouldn't really 
be an issue.

 Enhance metrics gathering in Spark Client [Spark Branch]
 

 Key: HIVE-8574
 URL: https://issues.apache.org/jira/browse/HIVE-8574
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

 The current implementation of metrics gathering in the Spark client is a 
 little hacky. First, it's awkward to use (and the implementation is also 
 pretty ugly). Second, it will just collect metrics indefinitely, so in the 
 long term it turns into a huge memory leak.
 We need a simplified interface and some mechanism for disposing of old 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6421) abs() should preserve precision/scale of decimal input

2014-11-26 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226551#comment-14226551
 ] 

Jason Dere commented on HIVE-6421:
--

Failure has been occurring in other precommit runs and does not appear to be 
related. [~ashutoshc], does this one look ok?

 abs() should preserve precision/scale of decimal input
 --

 Key: HIVE-6421
 URL: https://issues.apache.org/jira/browse/HIVE-6421
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-6421.1.txt, HIVE-6421.2.patch, HIVE-6421.3.patch


 {noformat}
 hive describe dec1;
 OK
 c1decimal(10,2)   None 
 hive explain select c1, abs(c1) from dec1;
  ...
 Select Operator
   expressions: c1 (type: decimal(10,2)), abs(c1) (type: 
 decimal(38,18))
 {noformat}
 Given that abs() is a GenericUDF it should be possible for the return type 
 precision/scale to match the input precision/scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8967) Fix bucketmapjoin7.q determinism

2014-11-26 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8967:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk and also merged to Spark branch. Thanks, Jimmy.

 Fix bucketmapjoin7.q determinism
 

 Key: HIVE-8967
 URL: https://issues.apache.org/jira/browse/HIVE-8967
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.15.0

 Attachments: HIVE-8967.patch


 In working on HIVE-8963, we found the output is not determistic. We can add 
 order by to make sure the output fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8971) HIVE-8965 exposed some classes which start with Test but are not tests

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226579#comment-14226579
 ] 

Hive QA commented on HIVE-8971:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683767/HIVE-8971.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6683 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1910/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1910/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1910/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683767 - PreCommit-HIVE-TRUNK-Build

 HIVE-8965 exposed some classes which start with Test but are not tests
 --

 Key: HIVE-8971
 URL: https://issues.apache.org/jira/browse/HIVE-8971
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8971.patch


 From the output here: 
 https://issues.apache.org/jira/browse/HIVE-8836?focusedCommentId=14225742page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14225742
 I've looked at the TestHBase* classes and they are not tests. PTest cannot 
 support classes which start with Test but are not tests.
 {noformat}
 TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file
 TestGenericUDFOPNumeric - did not produce a TEST-*.xml file
 TestHBaseKeyFactory - did not produce a TEST-*.xml file
 TestHBaseKeyFactory2 - did not produce a TEST-*.xml file
 TestHBaseKeyFactory3 - did not produce a TEST-*.xml file
 TestHBasePredicateDecomposer - did not produce a TEST-*.xml file
 TestTezSessionState - did not produce a TEST-*.xml file
 TestURLHook - did not produce a TEST-*.xml file
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8971) HIVE-8965 exposed some classes which start with Test but are not tests

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8971:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed this as I will be updating the trunk ptest server shortly to take 
advantage of HIVE-8965 in order to improve some of the delays we've seen in 
testing lately.

 HIVE-8965 exposed some classes which start with Test but are not tests
 --

 Key: HIVE-8971
 URL: https://issues.apache.org/jira/browse/HIVE-8971
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
 Fix For: 0.15.0

 Attachments: HIVE-8971.patch


 From the output here: 
 https://issues.apache.org/jira/browse/HIVE-8836?focusedCommentId=14225742page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14225742
 I've looked at the TestHBase* classes and they are not tests. PTest cannot 
 support classes which start with Test but are not tests.
 {noformat}
 TestAuthorizationApiAuthorizer - did not produce a TEST-*.xml file
 TestGenericUDFOPNumeric - did not produce a TEST-*.xml file
 TestHBaseKeyFactory - did not produce a TEST-*.xml file
 TestHBaseKeyFactory2 - did not produce a TEST-*.xml file
 TestHBaseKeyFactory3 - did not produce a TEST-*.xml file
 TestHBasePredicateDecomposer - did not produce a TEST-*.xml file
 TestTezSessionState - did not produce a TEST-*.xml file
 TestURLHook - did not produce a TEST-*.xml file
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7896) orcfiledump should be able to dump data

2014-11-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7896:
-
Status: Patch Available  (was: Open)

 orcfiledump should be able to dump data
 ---

 Key: HIVE-7896
 URL: https://issues.apache.org/jira/browse/HIVE-7896
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, 
 alltypes2.txt


 The FileDumper utility in orc, exposed as a service as orcfiledump, can print 
 out metadata from Orc files but not the actual data.  Being able to dump the 
 data is also useful in some debugging contexts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7896) orcfiledump should be able to dump data

2014-11-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7896:
-
Attachment: HIVE-7896.2.patch

Fixed error found by Prasanth and added unit test to confirm it.  Also changed 
the help message per Prasanth's suggestion.

 orcfiledump should be able to dump data
 ---

 Key: HIVE-7896
 URL: https://issues.apache.org/jira/browse/HIVE-7896
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, 
 alltypes2.txt


 The FileDumper utility in orc, exposed as a service as orcfiledump, can print 
 out metadata from Orc files but not the actual data.  Being able to dump the 
 data is also useful in some debugging contexts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)

2014-11-26 Thread Julian Hyde (JIRA)
Julian Hyde created HIVE-8974:
-

 Summary: Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Reporter: Julian Hyde
Assignee: Gunther Hagleitner


Calcite recently (after 0.9.2, before 1.0.0) re-organized its package structure 
and renamed a lot of classes. CALCITE-296 has the details, including a 
description of the before:after mapping.

This task is to upgrade to the version of Calcite that has the renamed 
packages. There is a 1.0.0-SNAPSHOT in Apache nexus.

Calcite functionality has not changed significantly, so it should be 
straightforward to rename. This task should be completed ASAP, before Calcite 
moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-8974) Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)

2014-11-26 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesús Camacho Rodríguez reassigned HIVE-8974:
-

Assignee: Jesús Camacho Rodríguez  (was: Gunther Hagleitner)

 Upgrade to Calcite 1.0.0-SNAPSHOT (with lots of renames)
 

 Key: HIVE-8974
 URL: https://issues.apache.org/jira/browse/HIVE-8974
 Project: Hive
  Issue Type: Task
Reporter: Julian Hyde
Assignee: Jesús Camacho Rodríguez

 Calcite recently (after 0.9.2, before 1.0.0) re-organized its package 
 structure and renamed a lot of classes. CALCITE-296 has the details, 
 including a description of the before:after mapping.
 This task is to upgrade to the version of Calcite that has the renamed 
 packages. There is a 1.0.0-SNAPSHOT in Apache nexus.
 Calcite functionality has not changed significantly, so it should be 
 straightforward to rename. This task should be completed ASAP, before Calcite 
 moves on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID

2014-11-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8875:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

 hive.optimize.sort.dynamic.partition should be turned off for ACID
 --

 Key: HIVE-8875
 URL: https://issues.apache.org/jira/browse/HIVE-8875
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.15.0

 Attachments: HIVE-8875.2.patch, HIVE-8875.patch


 Turning this on causes ACID insert, updates, and deletes to produce 
 non-optimal plans with extra reduce phases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8875) hive.optimize.sort.dynamic.partition should be turned off for ACID

2014-11-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8875:
-
Fix Version/s: 0.15.0

 hive.optimize.sort.dynamic.partition should be turned off for ACID
 --

 Key: HIVE-8875
 URL: https://issues.apache.org/jira/browse/HIVE-8875
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.15.0

 Attachments: HIVE-8875.2.patch, HIVE-8875.patch


 Turning this on causes ACID insert, updates, and deletes to produce 
 non-optimal plans with extra reduce phases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]

2014-11-26 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226651#comment-14226651
 ] 

Brock Noland commented on HIVE-8574:


bq. So unless we're worried about a single job creating so many tasks that it 
will run the driver out of memory with all the metrics data, this shouldn't 
really be an issue.

Any idea how much memory would be consumed for say 100K tasks?

 Enhance metrics gathering in Spark Client [Spark Branch]
 

 Key: HIVE-8574
 URL: https://issues.apache.org/jira/browse/HIVE-8574
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

 The current implementation of metrics gathering in the Spark client is a 
 little hacky. First, it's awkward to use (and the implementation is also 
 pretty ugly). Second, it will just collect metrics indefinitely, so in the 
 long term it turns into a huge memory leak.
 We need a simplified interface and some mechanism for disposing of old 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6421) abs() should preserve precision/scale of decimal input

2014-11-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226659#comment-14226659
 ] 

Ashutosh Chauhan commented on HIVE-6421:


yup. +1

 abs() should preserve precision/scale of decimal input
 --

 Key: HIVE-6421
 URL: https://issues.apache.org/jira/browse/HIVE-6421
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-6421.1.txt, HIVE-6421.2.patch, HIVE-6421.3.patch


 {noformat}
 hive describe dec1;
 OK
 c1decimal(10,2)   None 
 hive explain select c1, abs(c1) from dec1;
  ...
 Select Operator
   expressions: c1 (type: decimal(10,2)), abs(c1) (type: 
 decimal(38,18))
 {noformat}
 Given that abs() is a GenericUDF it should be possible for the return type 
 precision/scale to match the input precision/scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8574) Enhance metrics gathering in Spark Client [Spark Branch]

2014-11-26 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226668#comment-14226668
 ] 

Marcelo Vanzin commented on HIVE-8574:
--

Rounding up, each task metrics data structure will take around 256 bytes. So 
~25MB?

 Enhance metrics gathering in Spark Client [Spark Branch]
 

 Key: HIVE-8574
 URL: https://issues.apache.org/jira/browse/HIVE-8574
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin

 The current implementation of metrics gathering in the Spark client is a 
 little hacky. First, it's awkward to use (and the implementation is also 
 pretty ugly). Second, it will just collect metrics indefinitely, so in the 
 long term it turns into a huge memory leak.
 We need a simplified interface and some mechanism for disposing of old 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q

2014-11-26 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-8975:
-

 Summary: Possible performance regression on bucket_map_join_tez2.q
 Key: HIVE-8975
 URL: https://issues.apache.org/jira/browse/HIVE-8975
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Statistics
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez


After introducing the identity project removal optimization in HIVE-8435, plan 
in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In 
particular, earlier it was doing a map-join and after HIVE-8435 it changed to a 
reduce-join.

The query is the following one:
{noformat}
select a.key, b.key from (select distinct key from tab) a join tab b on b.key = 
a.key
{noformat}

The plan before removing the projections is:
{noformat}
TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13]
TS[6]-FIL[17]-RS[10]-JOIN[11]
{noformat}

And after removing identity projections:
{noformat}
TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
TS[6]-FIL[17]-RS[10]-JOIN[11]
{noformat}

After digging a bit, I realized it is not converting the reduce-join into a 
map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the 
optimization does not kick in. 
The reason for the stats change in the GroupBy operator is in [this 
line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633],
 where it is checked whether the GBY is immediately followed by a RS operator 
or not, and calculate stats differently depending on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q

2014-11-26 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226695#comment-14226695
 ] 

Jesus Camacho Rodriguez commented on HIVE-8975:
---

[~prasanth_j], what do you think?

 Possible performance regression on bucket_map_join_tez2.q
 -

 Key: HIVE-8975
 URL: https://issues.apache.org/jira/browse/HIVE-8975
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Statistics
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez

 After introducing the identity project removal optimization in HIVE-8435, 
 plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In 
 particular, earlier it was doing a map-join and after HIVE-8435 it changed to 
 a reduce-join.
 The query is the following one:
 {noformat}
 select a.key, b.key from (select distinct key from tab) a join tab b on b.key 
 = a.key
 {noformat}
 The plan before removing the projections is:
 {noformat}
 TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 And after removing identity projections:
 {noformat}
 TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 After digging a bit, I realized it is not converting the reduce-join into a 
 map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the 
 optimization does not kick in. 
 The reason for the stats change in the GroupBy operator is in [this 
 line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633],
  where it is checked whether the GBY is immediately followed by a RS operator 
 or not, and calculate stats differently depending on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226696#comment-14226696
 ] 

Hive QA commented on HIVE-8828:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683859/HIVE-8828.10.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1913/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1913/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1913/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-1913/source-prep.txt
+ [[ true == \t\r\u\e ]]
+ rm -rf ivy maven
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'hbase-handler/src/test/results/positive/hbase_custom_key.q.out'
Reverted 'hbase-handler/src/test/results/positive/hbase_custom_key2.q.out'
Reverted 'hbase-handler/src/test/results/positive/hbase_custom_key3.q.out'
Reverted 
'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java'
Reverted 
'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java'
Reverted 
'hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory3.java'
Reverted 'hbase-handler/src/test/queries/positive/hbase_custom_key.q'
Reverted 'hbase-handler/src/test/queries/positive/hbase_custom_key2.q'
Reverted 'hbase-handler/src/test/queries/positive/hbase_custom_key3.q'
Reverted 
'itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAuthzApiEmbedAuthorizerInRemote.java'
Reverted 
'itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAuthorizationApiAuthorizer.java'
Reverted 
'itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestAuthzApiEmbedAuthorizerInEmbed.java'
Reverted 'contrib/src/test/queries/clientpositive/url_hook.q'
Reverted 
'contrib/src/java/org/apache/hadoop/hive/contrib/metastore/hooks/TestURLHook.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionState.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPDivide.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPMod.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPMultiply.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPPlus.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPMinus.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFPosMod.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPNumeric.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target shims/scheduler/target 
packaging/target hbase-handler/target 
hbase-handler/src/test/org/apache/hadoop/hive/hbase/SampleHBaseKeyFactory.java 
hbase-handler/src/test/org/apache/hadoop/hive/hbase/SampleHBaseKeyFactory2.java 
hbase-handler/src/test/org/apache/hadoop/hive/hbase/SampleHBaseKeyFactory3.java 
testutils/target jdbc/target metastore/target itests/target 

[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8836:
---
Attachment: HIVE-8836.7-spark.patch

 Enable automatic tests with remote spark client [Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, 
 HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, 
 HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, 
 HIVE-8836.7-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]

2014-11-26 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226730#comment-14226730
 ] 

Brock Noland commented on HIVE-8836:


Attached patch has regenerated output for queries which had a different plan 
(number of reducers). It does not update the following:

*Query Result differences*
{noformat}
auto_join_without_localtask.q.out
count.q.out
join_filters_overlap.q.out
limit_pushdown.q.out
mapreduce2.q.out
multi_insert_gby3.q.out
multi_join_union.q.out
ppd_outer_join3.q.out
ptf_decimal.q.out
ptf_general_queries.q.out
smb_mapjoin_1.q.out
smb_mapjoin_2.q.out
smb_mapjoin_4.q.out
smb_mapjoin_5.q.out
smb_mapjoin_8.q.out
stats_counter.q.out
table_access_keys_stats.q.out
uniquejoin.q.out
vector_decimal_aggregate.q.out
vectorization_13.q.out
join_reorder.q.out
outer_join_ppr.q.out

*Failed*
{noformat}
bucketmapjoin1.q.out
groupby_multi_insert_common_distinct.q.out
groupby_multi_single_reducer.q.out
infer_bucket_sort_convert_join.q.out
mapjoin_hook.q.out
smb_mapjoin9
{noformat}



 Enable automatic tests with remote spark client [Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, 
 HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, 
 HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, 
 HIVE-8836.7-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]

2014-11-26 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226730#comment-14226730
 ] 

Brock Noland edited comment on HIVE-8836 at 11/26/14 7:54 PM:
--

Attached patch has regenerated output for queries which had a different plan 
(number of reducers). It does not update the following:

*Query Result differences*
{noformat}
auto_join_without_localtask.q.out
count.q.out
join_filters_overlap.q.out
limit_pushdown.q.out
mapreduce2.q.out
multi_insert_gby3.q.out
multi_join_union.q.out
ppd_outer_join3.q.out
ptf_decimal.q.out
ptf_general_queries.q.out
smb_mapjoin_1.q.out
smb_mapjoin_2.q.out
smb_mapjoin_4.q.out
smb_mapjoin_5.q.out
smb_mapjoin_8.q.out
stats_counter.q.out
table_access_keys_stats.q.out
uniquejoin.q.out
vector_decimal_aggregate.q.out
vectorization_13.q.out
join_reorder.q.out
outer_join_ppr.q.out
{noformat}

*Failed*
{noformat}
bucketmapjoin1.q.out
groupby_multi_insert_common_distinct.q.out
groupby_multi_single_reducer.q.out
infer_bucket_sort_convert_join.q.out
mapjoin_hook.q.out
smb_mapjoin9
{noformat}




was (Author: brocknoland):
Attached patch has regenerated output for queries which had a different plan 
(number of reducers). It does not update the following:

*Query Result differences*
{noformat}
auto_join_without_localtask.q.out
count.q.out
join_filters_overlap.q.out
limit_pushdown.q.out
mapreduce2.q.out
multi_insert_gby3.q.out
multi_join_union.q.out
ppd_outer_join3.q.out
ptf_decimal.q.out
ptf_general_queries.q.out
smb_mapjoin_1.q.out
smb_mapjoin_2.q.out
smb_mapjoin_4.q.out
smb_mapjoin_5.q.out
smb_mapjoin_8.q.out
stats_counter.q.out
table_access_keys_stats.q.out
uniquejoin.q.out
vector_decimal_aggregate.q.out
vectorization_13.q.out
join_reorder.q.out
outer_join_ppr.q.out

*Failed*
{noformat}
bucketmapjoin1.q.out
groupby_multi_insert_common_distinct.q.out
groupby_multi_single_reducer.q.out
infer_bucket_sort_convert_join.q.out
mapjoin_hook.q.out
smb_mapjoin9
{noformat}



 Enable automatic tests with remote spark client [Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, 
 HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, 
 HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, 
 HIVE-8836.7-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8828:
---
Status: Patch Available  (was: Open)

 Remove hadoop 20 shims
 --

 Key: HIVE-8828
 URL: https://issues.apache.org/jira/browse/HIVE-8828
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, 
 HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, 
 HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, 
 HIVE-8828.9.patch, HIVE-8828.patch


 CLEAR LIBRARY CACHE
 See : [mailing list discussion | 
 http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8828:
---
Status: Open  (was: Patch Available)

 Remove hadoop 20 shims
 --

 Key: HIVE-8828
 URL: https://issues.apache.org/jira/browse/HIVE-8828
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, 
 HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, 
 HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, 
 HIVE-8828.9.patch, HIVE-8828.patch


 CLEAR LIBRARY CACHE
 See : [mailing list discussion | 
 http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8828:
---
Attachment: HIVE-8828.11.patch

Another rebase after HIVE-8971

 Remove hadoop 20 shims
 --

 Key: HIVE-8828
 URL: https://issues.apache.org/jira/browse/HIVE-8828
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, 
 HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, 
 HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, 
 HIVE-8828.9.patch, HIVE-8828.patch


 CLEAR LIBRARY CACHE
 See : [mailing list discussion | 
 http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226742#comment-14226742
 ] 

Brock Noland commented on HIVE-8828:


+1

If the tests look ok we should commit this immediately.

 Remove hadoop 20 shims
 --

 Key: HIVE-8828
 URL: https://issues.apache.org/jira/browse/HIVE-8828
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, 
 HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, 
 HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, 
 HIVE-8828.9.patch, HIVE-8828.patch


 CLEAR LIBRARY CACHE
 See : [mailing list discussion | 
 http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6361) Un-fork Sqlline

2014-11-26 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated HIVE-6361:
--
Attachment: HIVE-6361.4.patch

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde
Assignee: Julian Hyde
 Attachments: HIVE-6361.2.patch, HIVE-6361.3.patch, HIVE-6361.4.patch, 
 HIVE-6361.patch


 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6361) Un-fork Sqlline

2014-11-26 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated HIVE-6361:
--
Affects Version/s: (was: 0.12.0)
   0.14.0
   Status: Patch Available  (was: Open)

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.14.0
Reporter: Julian Hyde
Assignee: Julian Hyde
 Attachments: HIVE-6361.2.patch, HIVE-6361.3.patch, HIVE-6361.4.patch, 
 HIVE-6361.patch


 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6361) Un-fork Sqlline

2014-11-26 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated HIVE-6361:
--
Status: Open  (was: Patch Available)

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde
Assignee: Julian Hyde
 Attachments: HIVE-6361.2.patch, HIVE-6361.3.patch, HIVE-6361.patch


 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2014-11-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226794#comment-14226794
 ] 

Alan Gates commented on HIVE-8966:
--

This flush length file should be removed when the batch is closed.  Are you 
closing the transaction batch on a regular basis?

 Delta files created by hive hcatalog streaming cannot be compacted
 --

 Key: HIVE-8966
 URL: https://issues.apache.org/jira/browse/HIVE-8966
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
 Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical

 hive hcatalog streaming will also create a file like bucket_n_flush_length in 
 each delta directory. Where n is the bucket number. But the 
 compactor.CompactorMR think this file also needs to compact. However this 
 file of course cannot be compacted, so compactor.CompactorMR will not 
 continue to do the compaction. 
 Did a test, after removed the bucket_n_flush_length file, then the alter 
 table partition compact finished successfully. If don't delete that file, 
 nothing will be compacted. 
 This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error

2014-11-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226801#comment-14226801
 ] 

Alan Gates commented on HIVE-8797:
--

So are you proposing we change it to 
{code}
   if (CheckJDOException.isJDODataStoreException(e)  tpart == null) {
  // Using utility method above, so that JDODataStoreException 
doesn't
  // have to be used here. This helps avoid adding jdo dependency 
for
  // hcatalog client uses
  LOG.debug(Caught JDO exception, trying to alter partition 
instead);
  tpart = getMSC().getPartitionWithAuthInfo(tbl.getDbName(),
tbl.getTableName(), pvals, getUserName(), getGroupNames());
  alterPartitionSpec(tbl, partSpec, tpart, inheritTableSpecs, 
partPath);
{code}

or

{code}
   if (CheckJDOException.isJDODataStoreException(e)) {
  // Using utility method above, so that JDODataStoreException 
doesn't
  // have to be used here. This helps avoid adding jdo dependency 
for
  // hcatalog client uses
  LOG.debug(Caught JDO exception, trying to alter partition 
instead);
  tpart = getMSC().getPartitionWithAuthInfo(tbl.getDbName(),
tbl.getTableName(), pvals, getUserName(), getGroupNames());
  if (tpart != null) {
alterPartitionSpec(tbl, partSpec, tpart, inheritTableSpecs, 
partPath);
  }
{code}
?

 Simultaneous dynamic inserts can result in partition already exists error
 ---

 Key: HIVE-8797
 URL: https://issues.apache.org/jira/browse/HIVE-8797
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8797.2.patch, HIVE-8797.patch


 If two users attempt a dynamic insert into the same new partition at the same 
 time, a possible race condition exists where both will attempt to create the 
 partition and one will fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8721) Enable transactional unit tests against other databases

2014-11-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226814#comment-14226814
 ] 

Alan Gates commented on HIVE-8721:
--

I'm fine to move this out of TxnDbUtil and put it somewhere more generic.  But 
I'm not sure where.  It works in TxnDbUtil because the transaction tests always 
start by calling TxnDbUtil.prepDb.  Is there an equivalent place we can 
guarantee gets called first in all unit tests?

 Enable transactional unit tests against other databases
 ---

 Key: HIVE-8721
 URL: https://issues.apache.org/jira/browse/HIVE-8721
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure, Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8721.patch


 Since TxnHandler and subclasses use JDBC to directly connect to the 
 underlying database (rather than relying on DataNucleus) it is important to 
 test that all of the operations work against different database flavors.  An 
 easy way to do this is to enable the unit tests to run against an external 
 database.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error

2014-11-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226819#comment-14226819
 ] 

Thejas M Nair commented on HIVE-8797:
-

I am proposing -

{code}
   if (CheckJDOException.isJDODataStoreException(e)) {
  // Using utility method above, so that JDODataStoreException 
doesn't
  // have to be used here. This helps avoid adding jdo dependency 
for
  // hcatalog client uses
  LOG.debug(Caught JDO exception, will attempt alter partition 
instead if partition exists now);
  tpart = getMSC().getPartitionWithAuthInfo(tbl.getDbName(),
tbl.getTableName(), pvals, getUserName(), getGroupNames());
  if (tpart == null) {
 // The exception was not caused by partition getting created 
by another call
 throw e;
  }
  alterPartitionSpec(tbl, partSpec, tpart, inheritTableSpecs, 
partPath);
{code}

 Simultaneous dynamic inserts can result in partition already exists error
 ---

 Key: HIVE-8797
 URL: https://issues.apache.org/jira/browse/HIVE-8797
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8797.2.patch, HIVE-8797.patch


 If two users attempt a dynamic insert into the same new partition at the same 
 time, a possible race condition exists where both will attempt to create the 
 partition and one will fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226820#comment-14226820
 ] 

Hive QA commented on HIVE-8836:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683900/HIVE-8836.7-spark.patch

{color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 7177 tests 
executed
*Failed tests:*
{noformat}
TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_custom_input_output_format
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_complex_types_multi_single_reducer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_insert_common_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_infer_bucket_sort_convert_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_filters_overlap
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join_reorder
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_hook
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapreduce2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_insert_gby3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_multi_join_union
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_outer_join_ppr
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ppd_outer_join3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf_decimal
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_ptf_general_queries
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_table_access_keys_stats
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_3
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_timestamp_lazy
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_uniquejoin
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_data_types
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorized_timestamp_funcs
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/449/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/449/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-449/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 48 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683900 - PreCommit-HIVE-SPARK-Build

 Enable 

[jira] [Created] (HIVE-8976) Make nine additional tests deterministic

2014-11-26 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8976:
--

 Summary: Make nine additional tests deterministic
 Key: HIVE-8976
 URL: https://issues.apache.org/jira/browse/HIVE-8976
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q

2014-11-26 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226865#comment-14226865
 ] 

Prasanth J commented on HIVE-8975:
--

[~jcamachorodriguez] I see what the issue here is. That check (RS after GBY) 
was used to determine map-reduce boundary. The map-side GBY has different stats 
logic as compared to reduce side GBY. 
Now after the identity projection removal optimization
{code}
TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
TS[6]-FIL[17]-RS[10]-JOIN[11]
{code}

both GBY[2] and GBY[4] are identified as map-side GBY. I think we need to 
improve that if condition to better differentiate map-side and reduce-side GBY. 
Somewhat better check would be if RS is contained in upstream operators of GBY 
then that GBY is reduce side. In the above case GBY[4] contains RS[3] in its 
upstreams operators. Any thoughts?

 Possible performance regression on bucket_map_join_tez2.q
 -

 Key: HIVE-8975
 URL: https://issues.apache.org/jira/browse/HIVE-8975
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Statistics
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez

 After introducing the identity project removal optimization in HIVE-8435, 
 plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In 
 particular, earlier it was doing a map-join and after HIVE-8435 it changed to 
 a reduce-join.
 The query is the following one:
 {noformat}
 select a.key, b.key from (select distinct key from tab) a join tab b on b.key 
 = a.key
 {noformat}
 The plan before removing the projections is:
 {noformat}
 TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 And after removing identity projections:
 {noformat}
 TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 After digging a bit, I realized it is not converting the reduce-join into a 
 map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the 
 optimization does not kick in. 
 The reason for the stats change in the GroupBy operator is in [this 
 line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633],
  where it is checked whether the GBY is immediately followed by a RS operator 
 or not, and calculate stats differently depending on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8326) Using DbTxnManager with concurrency off results in run time error

2014-11-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8326:
-
Attachment: HIVE-8326.patch

This patch changes DbTxnManager to check that concurrency is set to true when 
it is handed the config file.

 Using DbTxnManager with concurrency off results in run time error
 -

 Key: HIVE-8326
 URL: https://issues.apache.org/jira/browse/HIVE-8326
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8326.patch


 Setting
 {code}
 hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
 hive.support.concurrency=false
 {code}
 results in queries failing at runtime with an NPE in DbTxnManager.heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8326) Using DbTxnManager with concurrency off results in run time error

2014-11-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8326:
-
Status: Patch Available  (was: Open)

 Using DbTxnManager with concurrency off results in run time error
 -

 Key: HIVE-8326
 URL: https://issues.apache.org/jira/browse/HIVE-8326
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8326.patch


 Setting
 {code}
 hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
 hive.support.concurrency=false
 {code}
 results in queries failing at runtime with an NPE in DbTxnManager.heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2014-11-26 Thread Jihong Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226872#comment-14226872
 ] 

Jihong Liu commented on HIVE-8966:
--

Yes. Closed the transaction batch. Suggest to do either the following two 
updates, or do both:

1. if a file is non-bucket file, don't try to compact it. So update the 
following code:
   in org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.java
  Change the following code:

  private void addFileToMap(Matcher matcher, Path file, boolean sawBase,
  MapInteger, BucketTracker splitToBucketMap) {
  if (!matcher.find()) {
LOG.warn(Found a non-bucket file that we thought matched the bucket 
pattern!  +
file.toString());
  }

   .
 to:
   private void addFileToMap(Matcher matcher, Path file, boolean sawBase,
  MapInteger, BucketTracker splitToBucketMap) {
  if (!matcher.find()) {
LOG.warn(Found a non-bucket file that we thought matched the bucket 
pattern!  +
file.toString());
return;
  }
 

2. don't use the bucket file pattern to name to flush_length file. So update 
the following code:
  in org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.java
 change the following code:
   static Path getSideFile(org.apache.tools.ant.types.Path main) {
 return new Path(main + _flush_length);
   }

to:
 static Path getSideFile(org.apache.tools.ant.types.Path main) {
if (main.toString().startsWith(bucket_)) {
 return new Path(bkt+main.toString().substring(6)+ 
_flush_length);
}
  else return new Path(main + _flush_length);
  }
 
after did the above updates and re-compiled the hive-exec.jar, the compaction 
works fine now


 Delta files created by hive hcatalog streaming cannot be compacted
 --

 Key: HIVE-8966
 URL: https://issues.apache.org/jira/browse/HIVE-8966
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
 Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical

 hive hcatalog streaming will also create a file like bucket_n_flush_length in 
 each delta directory. Where n is the bucket number. But the 
 compactor.CompactorMR think this file also needs to compact. However this 
 file of course cannot be compacted, so compactor.CompactorMR will not 
 continue to do the compaction. 
 Did a test, after removed the bucket_n_flush_length file, then the alter 
 table partition compact finished successfully. If don't delete that file, 
 nothing will be compacted. 
 This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8976) Make nine additional tests deterministic

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8976:
---
Attachment: HIVE-8976.patch

 Make nine additional tests deterministic
 

 Key: HIVE-8976
 URL: https://issues.apache.org/jira/browse/HIVE-8976
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8976.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]

2014-11-26 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8943:

Attachment: HIVE-8943.2-spark.patch

Giving another try.

The refactoring of the big-table calculation algorithm had made it choose 
different big tables if more than one is available, tweaked the algorithm to 
choose the same one to minimize the diffs.

 Fix memory limit check for combine nested mapjoins [Spark Branch]
 -

 Key: HIVE-8943
 URL: https://issues.apache.org/jira/browse/HIVE-8943
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, 
 HIVE-8943.2-spark.patch


 Its the opposite problem of what we thought in HIVE-8701.
 SparkMapJoinOptimizer does combine nested mapjoins into one work due to 
 removal of RS for big-table.  So we need to enhance the check to calculate if 
 all the MapJoins in that work (spark-stage) will fit into the memory, 
 otherwise it might overwhelm memory for that particular spark executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8976) Make nine additional tests deterministic

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8976:
---
Affects Version/s: 0.15.0
   Status: Patch Available  (was: Open)

 Make nine additional tests deterministic
 

 Key: HIVE-8976
 URL: https://issues.apache.org/jira/browse/HIVE-8976
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8976.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8976) Make nine additional tests deterministic

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8976:
---
Description: 
{noformat}
auto_join_without_localtask.q
count.q
limit_pushdown.q
mapreduce2.q
multi_insert_gby3.q
multi_join_union.q
ppd_outer_join3.q
ptf_decimal.q
ptf_general_queries.q
{noformat}

 Make nine additional tests deterministic
 

 Key: HIVE-8976
 URL: https://issues.apache.org/jira/browse/HIVE-8976
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8976.patch


 {noformat}
 auto_join_without_localtask.q
 count.q
 limit_pushdown.q
 mapreduce2.q
 multi_insert_gby3.q
 multi_join_union.q
 ppd_outer_join3.q
 ptf_decimal.q
 ptf_general_queries.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]

2014-11-26 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226883#comment-14226883
 ] 

Brock Noland commented on HIVE-8836:


I am making the following tests deterministic over in HIVE-8976.

{noformat}
auto_join_without_localtask.q
count.q
limit_pushdown.q
mapreduce2.q
multi_insert_gby3.q
multi_join_union.q
ppd_outer_join3.q
ptf_decimal.q
ptf_general_queries.q
{noformat}

 Enable automatic tests with remote spark client [Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, 
 HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, 
 HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, 
 HIVE-8836.7-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2014-11-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226890#comment-14226890
 ] 

Alan Gates commented on HIVE-8966:
--

1 might be the right thing to do.  2 breaks backward compatibility.  Before we 
do that though I'd like to understand why you still see the flush length files 
hanging around.  In my tests I don't see this issue because the flush length 
file is properly cleaned up.  I want to make sure that its existence doesn't 
mean something else is wrong.

Do you see the flush length files in all delta directories or only the most 
recent?  

 Delta files created by hive hcatalog streaming cannot be compacted
 --

 Key: HIVE-8966
 URL: https://issues.apache.org/jira/browse/HIVE-8966
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
 Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical

 hive hcatalog streaming will also create a file like bucket_n_flush_length in 
 each delta directory. Where n is the bucket number. But the 
 compactor.CompactorMR think this file also needs to compact. However this 
 file of course cannot be compacted, so compactor.CompactorMR will not 
 continue to do the compaction. 
 Did a test, after removed the bucket_n_flush_length file, then the alter 
 table partition compact finished successfully. If don't delete that file, 
 nothing will be compacted. 
 This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7896) orcfiledump should be able to dump data

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226891#comment-14226891
 ] 

Hive QA commented on HIVE-7896:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683890/HIVE-7896.2.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6684 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-vectorization_16.q-mapjoin_mapjoin.q-groupby2.q-and-12-more
 - did not produce a TEST-*.xml file
TestParquetDirect - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1914/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1914/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1914/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683890 - PreCommit-HIVE-TRUNK-Build

 orcfiledump should be able to dump data
 ---

 Key: HIVE-7896
 URL: https://issues.apache.org/jira/browse/HIVE-7896
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, 
 alltypes2.txt


 The FileDumper utility in orc, exposed as a service as orcfiledump, can print 
 out metadata from Orc files but not the actual data.  Being able to dump the 
 data is also useful in some debugging contexts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-8977) TestParquetDirect should be abstract

2014-11-26 Thread Brock Noland (JIRA)
Brock Noland created HIVE-8977:
--

 Summary: TestParquetDirect should be abstract
 Key: HIVE-8977
 URL: https://issues.apache.org/jira/browse/HIVE-8977
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Priority: Minor


The class {{TestParquetDirect}} does not contain any tests but starts with 
Test. Thus the build system runs it and expects an output file. We should 
rename the file to {{AbstractTestParquetDirect}} and make the class abstract.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7896) orcfiledump should be able to dump data

2014-11-26 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226898#comment-14226898
 ] 

Prasanth J commented on HIVE-7896:
--

LGTM, +1

 orcfiledump should be able to dump data
 ---

 Key: HIVE-7896
 URL: https://issues.apache.org/jira/browse/HIVE-7896
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, 
 alltypes2.txt


 The FileDumper utility in orc, exposed as a service as orcfiledump, can print 
 out metadata from Orc files but not the actual data.  Being able to dump the 
 data is also useful in some debugging contexts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7896) orcfiledump should be able to dump data

2014-11-26 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226899#comment-14226899
 ] 

Brock Noland commented on HIVE-7896:


I'll handle the parquet test in HIVE-8977.

 orcfiledump should be able to dump data
 ---

 Key: HIVE-7896
 URL: https://issues.apache.org/jira/browse/HIVE-7896
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7896.2.patch, HIVE-7896.patch, alltypes.orc, 
 alltypes2.txt


 The FileDumper utility in orc, exposed as a service as orcfiledump, can print 
 out metadata from Orc files but not the actual data.  Being able to dump the 
 data is also useful in some debugging contexts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8976) Make nine additional tests deterministic

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8976:
---
Attachment: HIVE-8976.patch

Including tez tests

 Make nine additional tests deterministic
 

 Key: HIVE-8976
 URL: https://issues.apache.org/jira/browse/HIVE-8976
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8976.patch, HIVE-8976.patch


 {noformat}
 auto_join_without_localtask.q
 count.q
 limit_pushdown.q
 mapreduce2.q
 multi_insert_gby3.q
 multi_join_union.q
 ppd_outer_join3.q
 ptf_decimal.q
 ptf_general_queries.q
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8977) TestParquetDirect should be abstract

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8977:
---
 Assignee: Brock Noland
Affects Version/s: 0.15.0
   Status: Patch Available  (was: Open)

 TestParquetDirect should be abstract
 

 Key: HIVE-8977
 URL: https://issues.apache.org/jira/browse/HIVE-8977
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Attachments: HIVE-8977.patch


 The class {{TestParquetDirect}} does not contain any tests but starts with 
 Test. Thus the build system runs it and expects an output file. We should 
 rename the file to {{AbstractTestParquetDirect}} and make the class abstract.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8977) TestParquetDirect should be abstract

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8977:
---
Attachment: HIVE-8977.patch

 TestParquetDirect should be abstract
 

 Key: HIVE-8977
 URL: https://issues.apache.org/jira/browse/HIVE-8977
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Brock Noland
Priority: Minor
 Attachments: HIVE-8977.patch


 The class {{TestParquetDirect}} does not contain any tests but starts with 
 Test. Thus the build system runs it and expects an output file. We should 
 rename the file to {{AbstractTestParquetDirect}} and make the class abstract.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error

2014-11-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8797:
-
Status: Open  (was: Patch Available)

Makes sense.  I'll put up a new patch.

 Simultaneous dynamic inserts can result in partition already exists error
 ---

 Key: HIVE-8797
 URL: https://issues.apache.org/jira/browse/HIVE-8797
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8797.2.patch, HIVE-8797.patch


 If two users attempt a dynamic insert into the same new partition at the same 
 time, a possible race condition exists where both will attempt to create the 
 partition and one will fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8836:
---
Attachment: HIVE-8836.8-spark.patch

Latest patch incorporates my changes in HIVE-8976. We'll commit HIVE-8976 to 
trunk and the same changes in that patch in this JIRA to spark.

 Enable automatic tests with remote spark client [Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, 
 HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, 
 HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, 
 HIVE-8836.7-spark.patch, HIVE-8836.8-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2014-11-26 Thread Jihong Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226925#comment-14226925
 ] 

Jihong Liu commented on HIVE-8966:
--

That flush_length file is only in the most recent delta. By the way, for 
streaming loading, a transaction batch is probably always open since data keeps 
coming. Is it possible to do compaction in the streaming loading environment? 
Thanks 

 Delta files created by hive hcatalog streaming cannot be compacted
 --

 Key: HIVE-8966
 URL: https://issues.apache.org/jira/browse/HIVE-8966
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
 Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical

 hive hcatalog streaming will also create a file like bucket_n_flush_length in 
 each delta directory. Where n is the bucket number. But the 
 compactor.CompactorMR think this file also needs to compact. However this 
 file of course cannot be compacted, so compactor.CompactorMR will not 
 continue to do the compaction. 
 Did a test, after removed the bucket_n_flush_length file, then the alter 
 table partition compact finished successfully. If don't delete that file, 
 nothing will be compacted. 
 This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error

2014-11-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8797:
-
Status: Patch Available  (was: Open)

 Simultaneous dynamic inserts can result in partition already exists error
 ---

 Key: HIVE-8797
 URL: https://issues.apache.org/jira/browse/HIVE-8797
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8797.2.patch, HIVE-8797.3.patch, HIVE-8797.patch


 If two users attempt a dynamic insert into the same new partition at the same 
 time, a possible race condition exists where both will attempt to create the 
 partition and one will fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error

2014-11-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8797:
-
Attachment: HIVE-8797.3.patch

 Simultaneous dynamic inserts can result in partition already exists error
 ---

 Key: HIVE-8797
 URL: https://issues.apache.org/jira/browse/HIVE-8797
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8797.2.patch, HIVE-8797.3.patch, HIVE-8797.patch


 If two users attempt a dynamic insert into the same new partition at the same 
 time, a possible race condition exists where both will attempt to create the 
 partition and one will fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8836) Enable automatic tests with remote spark client [Spark Branch]

2014-11-26 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8836:
---
Attachment: HIVE-8836.9-spark.patch

v9 of the patch does *not* clear the environment before starting spark-submit 
as this was causing issues on various machines finding java.

 Enable automatic tests with remote spark client [Spark Branch]
 --

 Key: HIVE-8836
 URL: https://issues.apache.org/jira/browse/HIVE-8836
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chengxiang Li
Assignee: Rui Li
  Labels: Spark-M3
 Fix For: spark-branch

 Attachments: HIVE-8836.1-spark.patch, HIVE-8836.2-spark.patch, 
 HIVE-8836.3-spark.patch, HIVE-8836.4-spark.patch, HIVE-8836.5-spark.patch, 
 HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, HIVE-8836.6-spark.patch, 
 HIVE-8836.7-spark.patch, HIVE-8836.8-spark.patch, HIVE-8836.9-spark.patch


 In real production environment, remote spark client should be used to submit 
 spark job for Hive mostly, we should enable automatic test with remote spark 
 client to make sure the Hive feature workable with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8966) Delta files created by hive hcatalog streaming cannot be compacted

2014-11-26 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226943#comment-14226943
 ] 

Alan Gates commented on HIVE-8966:
--

Ok, that makes sense.  You're current delta has the file because it's still 
open and being written to.  It also explains why my tests don't see it, as they 
don't run long enough.  The streaming is always done by the time the compactor 
kicks in.  Why don't you post a patch to this JIRA with the change for 1, and I 
can get that committed.

[~hagleitn], I'd like to put this in 0.14.1 as well as trunk if you're ok with 
it, since it blocks compaction for users using the streaming interface.

 Delta files created by hive hcatalog streaming cannot be compacted
 --

 Key: HIVE-8966
 URL: https://issues.apache.org/jira/browse/HIVE-8966
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
 Environment: hive
Reporter: Jihong Liu
Assignee: Alan Gates
Priority: Critical

 hive hcatalog streaming will also create a file like bucket_n_flush_length in 
 each delta directory. Where n is the bucket number. But the 
 compactor.CompactorMR think this file also needs to compact. However this 
 file of course cannot be compacted, so compactor.CompactorMR will not 
 continue to do the compaction. 
 Did a test, after removed the bucket_n_flush_length file, then the alter 
 table partition compact finished successfully. If don't delete that file, 
 nothing will be compacted. 
 This is probably a very severity bug. Both 0.13 and 0.14 have this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q

2014-11-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226955#comment-14226955
 ] 

Ashutosh Chauhan commented on HIVE-8975:


[~prasanth_j] Instead of trying to determine whether its running in map or 
reduce, I think stats logic should really make different stats calculation 
based on mode GBY is running in. That mode can be determined via GBYDesc.Mode 
All we want is an estimate of # of rows coming out of GBY and that is dependent 
on whether it is a partial aggregation or full aggregation, not whether its in 
map or reduce. Thoughts ?

 Possible performance regression on bucket_map_join_tez2.q
 -

 Key: HIVE-8975
 URL: https://issues.apache.org/jira/browse/HIVE-8975
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Statistics
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez

 After introducing the identity project removal optimization in HIVE-8435, 
 plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In 
 particular, earlier it was doing a map-join and after HIVE-8435 it changed to 
 a reduce-join.
 The query is the following one:
 {noformat}
 select a.key, b.key from (select distinct key from tab) a join tab b on b.key 
 = a.key
 {noformat}
 The plan before removing the projections is:
 {noformat}
 TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 And after removing identity projections:
 {noformat}
 TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 After digging a bit, I realized it is not converting the reduce-join into a 
 map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the 
 optimization does not kick in. 
 The reason for the stats change in the GroupBy operator is in [this 
 line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633],
  where it is checked whether the GBY is immediately followed by a RS operator 
 or not, and calculate stats differently depending on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8977) TestParquetDirect should be abstract

2014-11-26 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226977#comment-14226977
 ] 

Szehon Ho commented on HIVE-8977:
-

+1 pending tests

 TestParquetDirect should be abstract
 

 Key: HIVE-8977
 URL: https://issues.apache.org/jira/browse/HIVE-8977
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.15.0
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor
 Attachments: HIVE-8977.patch


 The class {{TestParquetDirect}} does not contain any tests but starts with 
 Test. Thus the build system runs it and expects an output file. We should 
 rename the file to {{AbstractTestParquetDirect}} and make the class abstract.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]

2014-11-26 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8924:

Attachment: HIVE-8924.3-spark.patch

This changed diff of some tests including those with empty stages.  And 
join_view has a larger diff.  Not sure why, but the plan looks similar and the 
results are still the same.

 Investigate test failure for join_empty.q [Spark Branch]
 

 Key: HIVE-8924
 URL: https://issues.apache.org/jira/browse/HIVE-8924
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho
 Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch, 
 HIVE-8924.3-spark.patch


 This query has an interesting case where the big table work is empty. Here's 
 the MR plan:
 {noformat}
 STAGE DEPENDENCIES:
   Stage-4 is a root stage
   Stage-3 depends on stages: Stage-4
   Stage-0 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 b 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 b 
   TableScan
 alias: b
 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: UDFToDouble(key) is not null (type: boolean)
   Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 condition expressions:
   0 {key}
   1 {value}
 keys:
   0 UDFToDouble(key) (type: double)
   1 UDFToDouble(key) (type: double)
   Stage: Stage-3
 Map Reduce
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 ListSink
 {noformat}
 The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8924) Investigate test failure for join_empty.q [Spark Branch]

2014-11-26 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226999#comment-14226999
 ] 

Szehon Ho commented on HIVE-8924:
-

Correction: join_view looks ok, but optimize_nullscan has a larger diff.

 Investigate test failure for join_empty.q [Spark Branch]
 

 Key: HIVE-8924
 URL: https://issues.apache.org/jira/browse/HIVE-8924
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Szehon Ho
 Attachments: HIVE-8924-spark.patch, HIVE-8924.2-spark.patch, 
 HIVE-8924.3-spark.patch


 This query has an interesting case where the big table work is empty. Here's 
 the MR plan:
 {noformat}
 STAGE DEPENDENCIES:
   Stage-4 is a root stage
   Stage-3 depends on stages: Stage-4
   Stage-0 depends on stages: Stage-3
 STAGE PLANS:
   Stage: Stage-4
 Map Reduce Local Work
   Alias - Map Local Tables:
 b 
   Fetch Operator
 limit: -1
   Alias - Map Local Operator Tree:
 b 
   TableScan
 alias: b
 Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE 
 Column stats: NONE
 Filter Operator
   predicate: UDFToDouble(key) is not null (type: boolean)
   Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE 
 Column stats: NONE
   HashTable Sink Operator
 condition expressions:
   0 {key}
   1 {value}
 keys:
   0 UDFToDouble(key) (type: double)
   1 UDFToDouble(key) (type: double)
   Stage: Stage-3
 Map Reduce
   Local Work:
 Map Reduce Local Work
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
 ListSink
 {noformat}
 The plan for Spark is not correct. We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8797) Simultaneous dynamic inserts can result in partition already exists error

2014-11-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227003#comment-14227003
 ] 

Thejas M Nair commented on HIVE-8797:
-

+1

 Simultaneous dynamic inserts can result in partition already exists error
 ---

 Key: HIVE-8797
 URL: https://issues.apache.org/jira/browse/HIVE-8797
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-8797.2.patch, HIVE-8797.3.patch, HIVE-8797.patch


 If two users attempt a dynamic insert into the same new partition at the same 
 time, a possible race condition exists where both will attempt to create the 
 partition and one will fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227004#comment-14227004
 ] 

Hive QA commented on HIVE-8943:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683931/HIVE-8943.2-spark.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7180 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_stats2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/450/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/450/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-450/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683931 - PreCommit-HIVE-SPARK-Build

 Fix memory limit check for combine nested mapjoins [Spark Branch]
 -

 Key: HIVE-8943
 URL: https://issues.apache.org/jira/browse/HIVE-8943
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, 
 HIVE-8943.2-spark.patch


 Its the opposite problem of what we thought in HIVE-8701.
 SparkMapJoinOptimizer does combine nested mapjoins into one work due to 
 removal of RS for big-table.  So we need to enhance the check to calculate if 
 all the MapJoins in that work (spark-stage) will fit into the memory, 
 otherwise it might overwhelm memory for that particular spark executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8975) Possible performance regression on bucket_map_join_tez2.q

2014-11-26 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227005#comment-14227005
 ] 

Prasanth J commented on HIVE-8975:
--

[~ashutoshc] What are all the possible modes for map-side and reduce-side? 
Stats calculation also has some logic for hash-aggregation enabled vs disabled. 
Is it safe to assume that if mode is HASH/PARTIAL it is map-side? And if the 
mode is FULL then reduce-side?
If so I can change the logic accordingly without depending on the child/parent 
checks in operator tree. 

 Possible performance regression on bucket_map_join_tez2.q
 -

 Key: HIVE-8975
 URL: https://issues.apache.org/jira/browse/HIVE-8975
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Statistics
Affects Versions: 0.15.0
Reporter: Jesus Camacho Rodriguez

 After introducing the identity project removal optimization in HIVE-8435, 
 plan in bucket_map_join_tez2.q that runs on Tez changed to be sub-optimal. In 
 particular, earlier it was doing a map-join and after HIVE-8435 it changed to 
 a reduce-join.
 The query is the following one:
 {noformat}
 select a.key, b.key from (select distinct key from tab) a join tab b on b.key 
 = a.key
 {noformat}
 The plan before removing the projections is:
 {noformat}
 TS[0]-FIL[16]-SEL[1]-GBY[2]-RS[3]-GBY[4]-SEL[5]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 And after removing identity projections:
 {noformat}
 TS[0]-FIL[16]-GBY[2]-RS[3]-GBY[4]-RS[8]-JOIN[11]-SEL[12]-FS[13]
 TS[6]-FIL[17]-RS[10]-JOIN[11]
 {noformat}
 After digging a bit, I realized it is not converting the reduce-join into a 
 map-join because stats for GBY\[4\] change if SEL\[5\] is removed; thus the 
 optimization does not kick in. 
 The reason for the stats change in the GroupBy operator is in [this 
 line|https://github.com/apache/hive/blob/6f4365e8a21e7b480bf595d079a71303a50bf1b2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L633],
  where it is checked whether the GBY is immediately followed by a RS operator 
 or not, and calculate stats differently depending on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8934) Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark Branch]

2014-11-26 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-8934:
---
Attachment: HIVE-8934.3-spark.patch

Regenerated test outputs.

 Investigate test failure on bucketmapjoin10.q and bucketmapjoin11.q [Spark 
 Branch]
 --

 Key: HIVE-8934
 URL: https://issues.apache.org/jira/browse/HIVE-8934
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chao
Assignee: Chao
 Attachments: HIVE-8934.1-spark.patch, HIVE-8934.2-spark.patch, 
 HIVE-8934.3-spark.patch


 With MapJoin enabled, these two tests will generate incorrect results.
 This seem to be related to the HiveInputFormat that these two are using.
 We need to investigate the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8828) Remove hadoop 20 shims

2014-11-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227028#comment-14227028
 ] 

Hive QA commented on HIVE-8828:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12683906/HIVE-8828.11.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6680 tests executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-vectorization_16.q-mapjoin_mapjoin.q-groupby2.q-and-12-more
 - did not produce a TEST-*.xml file
TestParquetDirect - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1915/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1915/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1915/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12683906 - PreCommit-HIVE-TRUNK-Build

 Remove hadoop 20 shims
 --

 Key: HIVE-8828
 URL: https://issues.apache.org/jira/browse/HIVE-8828
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-8828.1.patch, HIVE-8828.10.patch, 
 HIVE-8828.11.patch, HIVE-8828.2.patch, HIVE-8828.3.patch, HIVE-8828.4.patch, 
 HIVE-8828.5.patch, HIVE-8828.6.patch, HIVE-8828.7.patch, HIVE-8828.8.patch, 
 HIVE-8828.9.patch, HIVE-8828.patch


 CLEAR LIBRARY CACHE
 See : [mailing list discussion | 
 http://mail-archives.apache.org/mod_mbox/hive-dev/201410.mbox/%3CCABgNGzfSB5VGTecONg0GgLCDdLLFfzLuZvP%2BGSBc0i0joqf3fg%40mail.gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]

2014-11-26 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-8943:

Attachment: HIVE-8943.3-spark.patch

 Fix memory limit check for combine nested mapjoins [Spark Branch]
 -

 Key: HIVE-8943
 URL: https://issues.apache.org/jira/browse/HIVE-8943
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, 
 HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch


 Its the opposite problem of what we thought in HIVE-8701.
 SparkMapJoinOptimizer does combine nested mapjoins into one work due to 
 removal of RS for big-table.  So we need to enhance the check to calculate if 
 all the MapJoins in that work (spark-stage) will fit into the memory, 
 otherwise it might overwhelm memory for that particular spark executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8943) Fix memory limit check for combine nested mapjoins [Spark Branch]

2014-11-26 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227032#comment-14227032
 ] 

Szehon Ho commented on HIVE-8943:
-

Forgot to generate the golden files for new tests in CLIDriver.

 Fix memory limit check for combine nested mapjoins [Spark Branch]
 -

 Key: HIVE-8943
 URL: https://issues.apache.org/jira/browse/HIVE-8943
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-8943.1-spark.patch, HIVE-8943.1-spark.patch, 
 HIVE-8943.2-spark.patch, HIVE-8943.3-spark.patch


 Its the opposite problem of what we thought in HIVE-8701.
 SparkMapJoinOptimizer does combine nested mapjoins into one work due to 
 removal of RS for big-table.  So we need to enhance the check to calculate if 
 all the MapJoins in that work (spark-stage) will fit into the memory, 
 otherwise it might overwhelm memory for that particular spark executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8964) Some TestMiniTezCliDriver tests taking two hours

2014-11-26 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227040#comment-14227040
 ] 

Brock Noland commented on HIVE-8964:


I am pretty sure this is {{lvj_mapjoin.q}} which was added in HIVE-. I've 
excluded that test on the PTest side. We'll see if that helps.

 Some TestMiniTezCliDriver tests taking two hours
 

 Key: HIVE-8964
 URL: https://issues.apache.org/jira/browse/HIVE-8964
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Gunther Hagleitner
Priority: Blocker

 The test {{TestMiniTezCliDriver}} with the following query files:
 vectorization_16.q,mapjoin_mapjoin.q,groupby2.q,lvj_mapjoin.q,vectorization_5.q,vectorization_pushdown.q,orc_merge_incompat1.q,cbo_gby.q,vectorization_4.q,auto_join0.q,cross_product_check_1.q,vectorization_not.q,update_where_no_match.q,ctas.q,cbo_udf_udaf.q
 is timing out after two hours severely delaying the Hive precommits
 http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1898/failed/TestMiniTezCliDriver-vectorization_16.q-mapjoin_mapjoin.q-groupby2.q-and-12-more/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >