date:20140909

Chengxiang Li created HIVE-8029:
---

 Summary: Remove reducers number configure in SparkTask[Spark 
Branch]
 Key: HIVE-8029
 URL: https://issues.apache.org/jira/browse/HIVE-8029
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li


We do not need duplicated logic to configure reducers number in SparkTask, as 
SetSparkReduceParallelism would always set reducers number in compiler phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5871) Use multiple-characters as field delimiter

2014-09-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126682#comment-14126682
 ] 

Lefty Leverenz commented on HIVE-5871:
--

Actually you _can_ edit comments, although that's discouraged because 
preserving the history is important.  (A good workaround is to append Edit 
date: to the comment instead of changing the original text.)  The edit 
function is via a pencil icon in the upper right corner of each comment.  But 
updating the description is also good.

If you have wiki update permission, each Hive wiki page has an Edit link with a 
pencil icon in the upper right corner, next to Share and Tools.  If not, you 
can request permission as described in AboutThisWiki which you can find on the 
left side of the Home page.

A preliminary guide to editing the wiki is in a comment on HIVE-7142.  By the 
way, that's an edited comment.

Here are the links:

* [AboutThisWiki -- How to get permission to edit | 
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit]
* [HIVE-7142 comment about how to edit the wiki | 
https://issues.apache.org/jira/browse/HIVE-7142?focusedCommentId=14096756page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14096756]

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 24627: HIVE-7704: Create tez task for fast file merging

2014-09-09 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24627/#review52679
---



ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java
https://reviews.apache.org/r/24627/#comment91637

All these were exisiting code taken from MergeMapper.java. Anyways, I 
rewrote the comment in the new patch. Also merged fixTmpPath and 
fixTmpPathConcatenate method to single method in the new patch.



ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java
https://reviews.apache.org/r/24627/#comment91638

Updated in new patch.



ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java
https://reviews.apache.org/r/24627/#comment91639

Updated comment in new patch.



ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java
https://reviews.apache.org/r/24627/#comment91640

This is all gone in new patch.



ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
https://reviews.apache.org/r/24627/#comment91641

Thats the eclipse. I used intellij. Perhaps both seems to do the opposite. 
:)



ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
https://reviews.apache.org/r/24627/#comment91642

Fixed it.



ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
https://reviews.apache.org/r/24627/#comment91643

If I use Map interface here, then I need to cast it to LinkedHashMap when I 
set aliasToWork(). To avoid casting I am using ListkedHashMap on LHS.



ql/src/java/org/apache/hadoop/hive/ql/exec/OrcFileMergeOperator.java
https://reviews.apache.org/r/24627/#comment91644

Fixed it.



ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileOutputFormat.java
https://reviews.apache.org/r/24627/#comment91645

This is a dummy output format which is set in DagUtils/MergeFileTask. I am 
just using to make sure the operator pipeline for fast file merge is 
initialized properly. If the operator pipeline is wrongly initialized with say 
TS - FS, then FS will get record writer from this output format throwing 
RuntimeException. If OFM or RFM operators are initialized then this will never 
be called.

The RCFile and ORC file writers handles output file opening and closing 
themselves. It does not use the standard record writer interfaces for writing 
the output. Both RCFile and ORC use custom interfaces for block level and 
stripe level writing respectively.


- Prasanth_J


On Sept. 6, 2014, 2:03 a.m., Prasanth_J wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24627/
 ---
 
 (Updated Sept. 6, 2014, 2:03 a.m.)
 
 
 Review request for hive and Gunther Hagleitner.
 
 
 Bugs: HIVE-7704
 https://issues.apache.org/jira/browse/HIVE-7704
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Currently tez falls back to MR task for merge file task. It will beneficial 
 to convert the merge file tasks to tez task to make use of the performance 
 gains from tez.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 54e2b18 
   itests/src/test/resources/testconfiguration.properties 99049ca 
   
 ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
  6f23575 
   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java e076683 
   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8946221 
   ql/src/java/org/apache/hadoop/hive/ql/exec/OrcFileMergeOperator.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/RCFileMergeOperator.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 3d74459 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2d9b9c3 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 4ff568d1 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileRecordProcessor.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileTezProcessor.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RecordProcessor.java 994721f 
   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java 831e6a5 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileInputFormat.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileMapper.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileOutputFormat.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileTask.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java 
 PRE-CREATION

[jira] [Commented] (HIVE-5871) Use multiple-characters as field delimiter

2014-09-09 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126690#comment-14126690
 ] 

Rui Li commented on HIVE-5871:
--

Thanks [~leftylev] that's very helpful. However I don't see a pencil icon in 
the upper right corner of my comments. (There is one in the description 
though). Wonder if I'm still missing something?

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7818) Support boolean PPD for ORC


[ 
https://issues.apache.org/jira/browse/HIVE-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126696#comment-14126696
 ] 

Hive QA commented on HIVE-7818:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667118/HIVE-7818.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6186 tests executed
*Failed tests:*
{noformat}
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/702/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/702/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-702/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667118

 Support boolean PPD for ORC
 ---

 Key: HIVE-7818
 URL: https://issues.apache.org/jira/browse/HIVE-7818
 Project: Hive
  Issue Type: Improvement
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: HIVE-7818.1.patch


 Currently ORC does collect stats for boolean field. However, the boolean 
 stats is not range based, instead, it collects counts of true records. 
 RecordReaderImpl.evaluatePredicate currently only deals with range based 
 stats, we need to improve it to deal with the boolean stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open


 [ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-7950:
-
Attachment: hive-7950-tez-WIP.diff

I took a look at the tez branch to see if I could add more resources to an 
existing session as you described, [~sershe]. Looking at the javadoc, I feel 
like this patch should work, but the query still errors out when the map inside 
the dag fails due to missing classes.

I can see that the dag does get the extra jars localized:
{noformat}
2014-09-08 23:20:34,823 INFO [AsyncDispatcher event handler] 
org.apache.tez.dag.app.dag.impl.DAGImpl: Added additional resources : 
[[file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-fate-1.6.0.jar,
 
file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-core-1.6.0.jar,
 
file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-trace-1.6.0.jar,
 
file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/accumulo-start-1.6.0.jar,
 
file:/usr/local/lib/hadoop-2.6.0-SNAPSHOT/yarn/nm-tmp/usercache/jelser/appcache/application_1410243497503_0001/container_1410243497503_0001_01_01/zookeeper-3.4.6.jar]]
 to classpath
{noformat}

But I'm still getting a NoClassDefFoundException on a class which is in 
accumulo-core.jar:
{noformat}
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:183)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for 
class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:384)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:281)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:73)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:134)
... 12 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer for 
class: org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableDesc (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at

[jira] [Commented] (HIVE-7704) Create tez task for fast file merging

2014-09-09 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126719#comment-14126719
 ] 

Prasanth J commented on HIVE-7704:
--

Addressed Vikram's review comments.

 Create tez task for fast file merging
 -

 Key: HIVE-7704
 URL: https://issues.apache.org/jira/browse/HIVE-7704
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, 
 HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, 
 HIVE-7704.7.patch, HIVE-7704.8.patch


 Currently tez falls back to MR task for merge file task. It will beneficial 
 to convert the merge file tasks to tez task to make use of the performance 
 gains from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7704) Create tez task for fast file merging

2014-09-09 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7704:
-
Attachment: HIVE-7704.8.patch

 Create tez task for fast file merging
 -

 Key: HIVE-7704
 URL: https://issues.apache.org/jira/browse/HIVE-7704
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, 
 HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, 
 HIVE-7704.7.patch, HIVE-7704.8.patch


 Currently tez falls back to MR task for merge file task. It will beneficial 
 to convert the merge file tasks to tez task to make use of the performance 
 gains from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 24627: HIVE-7704: Create tez task for fast file merging

2014-09-09 Thread j . prasanth . j


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24627/
---

(Updated Sept. 9, 2014, 7:32 a.m.)


Review request for hive and Gunther Hagleitner.


Changes
---

Addressed Vikram's review comment.s


Bugs: HIVE-7704
https://issues.apache.org/jira/browse/HIVE-7704


Repository: hive-git


Description
---

Currently tez falls back to MR task for merge file task. It will beneficial to 
convert the merge file tasks to tez task to make use of the performance gains 
from tez.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 31aeba9 
  itests/src/test/resources/testconfiguration.properties 99049ca 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 6f23575 
  ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java e076683 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 7477199 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8946221 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OrcFileMergeOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/RCFileMergeOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 3d74459 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 5bbf3f6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 4ff568d1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileRecordProcessor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/MergeFileTezProcessor.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/RecordProcessor.java 994721f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java 831e6a5 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileInputFormat.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileMapper.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileOutputFormat.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileTask.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeFileWork.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeInputFormat.java 4651920 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java 6c691b1 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeOutputFormat.java a3ce699 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeTask.java c30476b 
  ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeWork.java 9efee3c 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileMergeMapper.java 13ec642 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFileStripeMergeInputFormat.java 
a6c92fb 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/Writer.java c391b0e 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 195d60e 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeInputFormat.java
 6809c79 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 
dee6b1c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 7129ed8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 11a9419 
  ql/src/java/org/apache/hadoop/hive/ql/plan/FileMergeDesc.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/OrcFileMergeDesc.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/RCFileMergeDesc.java PRE-CREATION 
  ql/src/test/queries/clientpositive/list_bucket_dml_8.q 9e81b8d 
  ql/src/test/queries/clientpositive/orc_merge1.q ee65b98 
  ql/src/test/queries/clientpositive/orc_merge5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/orc_merge6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/orc_merge7.q PRE-CREATION 
  ql/src/test/results/clientpositive/infer_bucket_sort_dyn_part.q.out 11c7578 
  ql/src/test/results/clientpositive/list_bucket_dml_10.q.out 8de452f 
  ql/src/test/results/clientpositive/list_bucket_dml_4.q.out b1c060e 
  ql/src/test/results/clientpositive/list_bucket_dml_6.q.out 3450d63 
  ql/src/test/results/clientpositive/list_bucket_dml_7.q.out f6a4cb5 
  ql/src/test/results/clientpositive/list_bucket_dml_9.q.out 796c7af 
  ql/src/test/results/clientpositive/merge_dynamic_partition4.q.out 0899648 
  ql/src/test/results/clientpositive/merge_dynamic_partition5.q.out 0653469 
  ql/src/test/results/clientpositive/orc_createas1.q.out 993c853 
  ql/src/test/results/clientpositive/orc_merge1.q.out 7f88125 
  ql/src/test/results/clientpositive/orc_merge3.q.out 258f538 
  ql/src/test/results/clientpositive/orc_merge5.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/orc_merge6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/orc_merge7.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/rcfile_createas1.q.out cdfa036

[jira] [Commented] (HIVE-7777) add CSV support for Serde


[ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126737#comment-14126737
 ] 

Hive QA commented on HIVE-:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667327/HIVE-.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6188 tests executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/703/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/703/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-703/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667327

 add CSV support for Serde
 -

 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-.patch, csv-serde-master.zip


 There is no official support for csvSerde for hive while there is an open 
 source project in github(https://github.com/ogrodnek/csv-serde). CSV is of 
 high frequency in use as a data format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Timeline for release of Hive 0.14

2014-09-09 Thread Suma Shivaprasad

Please include https://issues.apache.org/jira/browse/HIVE-7694 as well. It
is currently under review by Amareshwari and should be done in the next
couple of days.

Thanks
Suma


On Mon, Sep 8, 2014 at 5:44 PM, Alan Gates ga...@hortonworks.com wrote:

 I'll review that.  I just need the time to test it against mysql, oracle,
 and hopefully sqlserver.  But I think we can do this post branch if we need
 to, as it's a bug fix rather than a feature.

 Alan.

   Damien Carol dca...@blitzbs.com
  September 8, 2014 at 3:19
  Same request for https://issues.apache.org/jira/browse/HIVE-7689

 I already provided a patch, re-based it many times and I'm waiting for a
 review.

 Regards,

 Le 08/09/2014 12:08, amareshwarisr . a écrit :

   amareshwarisr . amareshw...@gmail.com
  September 8, 2014 at 3:08
 Would like to include https://issues.apache.org/jira/browse/HIVE-2390 and
 https://issues.apache.org/jira/browse/HIVE-7936.

 I can review and merge them.

 Thanks
 Amareshwari



   Vikram Dixit vik...@hortonworks.com
  September 5, 2014 at 17:53
 Hi Folks,

 I am going to start consolidating the items mentioned in this list and
 create a wiki page to track it. I will wait till the end of next week to
 create the branch taking into account Ashutosh's request.

 Thanks
 Vikram.


 On Fri, Sep 5, 2014 at 5:39 PM, Ashutosh Chauhan hashut...@apache.org
 hashut...@apache.org

   Ashutosh Chauhan hashut...@apache.org
  September 5, 2014 at 17:39
 Vikram,

 Some of us are working on stabilizing cbo branch and trying to get it
 merged into trunk. We feel we are close. May I request to defer cutting the
 branch for few more days? Folks interested in this can track our progress
 here : https://issues.apache.org/jira/browse/HIVE-7946

 Thanks,
 Ashutosh


 On Fri, Aug 22, 2014 at 4:09 PM, Lars Francke lars.fran...@gmail.com
 lars.fran...@gmail.com

   Lars Francke lars.fran...@gmail.com
  August 22, 2014 at 16:09
 Thank you for volunteering to do the release. I think a 0.14 release is a
 good idea.

 I have a couple of issues I'd like to get in too:

 * Either HIVE-7107[0] (Fix an issue in the HiveServer1 JDBC driver) or
 HIVE-6977[1] (Delete HiveServer1). The former needs a review the latter a
 patch
 * HIVE-6123[2] Checkstyle in Maven needs a review

 HIVE-7622[3]  HIVE-7543[4] are waiting for any reviews or comments on my
 previous thread[5]. I'd still appreciate any helpers for reviews or even
 just comments. I'd feel very sad if I had done all that work for nothing.
 Hoping this thread gives me a wider audience. Both patches fix up issues
 that should have been caught in earlier reviews as they are almost all
 Checkstyle or other style violations but they make for huge patches. I
 could also create hundreds of small issues or stop doing these things
 entirely



 [0] https://issues.apache.org/jira/browse/HIVE-7107
 https://issues.apache.org/jira/browse/HIVE-7107
 [1] https://issues.apache.org/jira/browse/HIVE-6977
 https://issues.apache.org/jira/browse/HIVE-6977
 [2] https://issues.apache.org/jira/browse/HIVE-6123
 https://issues.apache.org/jira/browse/HIVE-6123
 [3] https://issues.apache.org/jira/browse/HIVE-7622
 https://issues.apache.org/jira/browse/HIVE-7622
 [4] https://issues.apache.org/jira/browse/HIVE-7543
 https://issues.apache.org/jira/browse/HIVE-7543

 On Fri, Aug 22, 2014 at 11:01 PM, John Pullokkaran 


 --
 Sent with Postbox http://www.getpostbox.com

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-09 Thread Lars Francke


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52688
---


Looks good apart from minor comments.

Maybe add a test for the Serialization part?
https://issues.apache.org/jira/browse/HIVE-5976 integration might be nice: 
STORED AS CSV. Unfortunately there's no documentation yet so I'm not sure if 
it's feasible.


serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91646

This comment doesn't add value so I suggest removing it.



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91647

* Constants is deprecated. Use serdeConstants instead
* Exceeds maximum line length (100 chars)



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91648

Unused



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91651

Missing spaces around operators



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91650

2 x Unnecessary this



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91653

Missing spaces around operators



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91669

I suggest moving these properties to Constants somewhere



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91649

Method declared final in final class



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91657

long line



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91656

Missing spaces



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91659

I don't quite get this comment. Looking at the two CSVReader constructors 
they seem to do the same in this case. From how I understand it this 
if-statement is not needed. Same for the newWriter method.

Maybe I'm missing something?



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91660

Missing @Override annotation



serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java
https://reviews.apache.org/r/25468/#comment91661

Can be private too



serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java
https://reviews.apache.org/r/25468/#comment91662

Properties.put should not be used. Use setProperty instead. Also Constants 
== deprecated


- Lars Francke


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu

[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end


[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126766#comment-14126766
 ] 

Damien Carol commented on HIVE-7689:


My bad.
It's a new method {{findColumnsWithStats}} added in {{CompactionTxnHandler}}.
I written :
{code:java}
if (ci.partName == null)
s +=  AND  + TxnDbUtil.getEscape(PARTITION_NAME, 
identifierQuoteString) + =' + ci.partName + ';
{code}
Instead of 
{code:java}
if (ci.partName != null)
s +=  AND  + TxnDbUtil.getEscape(PARTITION_NAME, 
identifierQuoteString) + =' + ci.partName + ';
{code}
These produce :
{noformat}
2014-09-09 10:34:12,818 ERROR [nc-h04-22]: compactor.Worker 
(Worker.java:run(165)) - Caught an exception in the main loop of compactor 
worker nc-h04-22, exiting MetaException(message:Unable to connect to 
transaction database org.postgresql.util.PSQLException: ERROR: column 
PARTITION_NAME does not exist
  Position: 104
at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2096)
at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1829)
at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:372)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:252)
at 
com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)
at 
org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findColumnsWithStats(CompactionTxnHandler.java:628)
at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:140)
)
at 
org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findColumnsWithStats(CompactionTxnHandler.java:645)
at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:140)
{noformat}
This error breack all compactions. Version 4 of the patch was ok but version 5 
introduce a new bug.
Fixed with new patch now.

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on postgres 
 metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end


[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126772#comment-14126772
 ] 

Damien Carol commented on HIVE-7689:


Verified with our streaming benchmark on real hardware and with :
{noformat}
mvn -B -o test -Phadoop-2 -Dtest=TestWorker
{noformat}

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7889.1.patch, HIVE-7889.2.patch, 
 HIVE-7889.3.patch, HIVE-7889.4.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on postgres 
 metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end


 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---
Attachment: HIVE-7689.6.patch

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7889.1.patch, 
 HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on postgres 
 metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 24602: HIVE-7689 : Enable Postgres as METASTORE back-end

2014-09-09 Thread Damien Carol


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24602/
---

(Updated sep. 9, 2014, 8:55 matin)


Review request for hive.


Changes
---

Rebased on last trunk and fixed a test failure.


Bugs: HIVE-7689
https://issues.apache.org/jira/browse/HIVE-7689


Repository: hive-git


Description
---

I maintain few patches to make Metastore works with Postgres back end in our 
production environment.
The main goal of this JIRA is to push upstream these patches.

This patch enable these features :
* LOCKS on postgres metastore
* COMPACTION on postgres metastore
* TRANSACTION on postgres metastore
* fix metastore update script for postgres


Diffs (updated)
-

  metastore/scripts/upgrade/postgres/hive-txn-schema-0.13.0.postgres.sql 
2ebd3b0 
  
metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
 d3aa66f 
  metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java 
df183a0 
  metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
f1697bb 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 264052f 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsAggregator.java 
b074ca9 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 
5e317ab 
  ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 4625d27 

Diff: https://reviews.apache.org/r/24602/diff/


Testing
---

Using patched version in production. Enable concurrency with DbTxnManager.


Thanks,

Damien Carol

[jira] [Updated] (HIVE-2390) Expand support for union types

2014-09-09 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-2390:
--
  Resolution: Fixed
Release Note: Adds UnionType support in LazyBinarySerde
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Suma!

 Expand support for union types
 --

 Key: HIVE-2390
 URL: https://issues.apache.org/jira/browse/HIVE-2390
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Jakob Homan
Assignee: Suma Shivaprasad
  Labels: uniontype
 Fix For: 0.14.0

 Attachments: HIVE-2390.1.patch, HIVE-2390.patch


 When the union type was introduced, full support for it wasn't provided.  For 
 instance, when working with a union that gets passed to LazyBinarySerde: 
 {noformat}Caused by: java.lang.RuntimeException: Unrecognized type: UNION
   at 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:468)
   at 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:230)
   at 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:184)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails

2014-09-09 Thread Amareshwari Sriramadasu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126785#comment-14126785
 ] 

Amareshwari Sriramadasu commented on HIVE-7694:
---

+1 Code changes look fine to me. 
[~suma.shivaprasad], Can you rebase the patch? Also run tests once again as the 
last test build was having test failures.
Make sure failed tests are not failing on your local machine before submitting 
again

 SMB join on tables differing by number of sorted by columns with same join 
 prefix fails
 ---

 Key: HIVE-7694
 URL: https://issues.apache.org/jira/browse/HIVE-7694
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7694.1.patch, HIVE-7694.patch


 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, the following exception is seen
 {noformat}
 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
 1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 24630: HIVE-7694 - SMB joins on tables differing by number of sorted by columns but same sort prefix and join keys fail

2014-09-09 Thread Amareshwari Sriramadasu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24630/#review52695
---

Ship it!


Ship It!

- Amareshwari Sriramadasu


On Sept. 8, 2014, 5:25 p.m., Suma Shivaprasad wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24630/
 ---
 
 (Updated Sept. 8, 2014, 5:25 p.m.)
 
 
 Review request for hive, Amareshwari Sriramadasu, Brock Noland, Gunther 
 Hagleitner, and Navis Ryu.
 
 
 Bugs: HIVE-7694
 https://issues.apache.org/jira/browse/HIVE-7694
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, an exception is seen as reported in 
 https://issues.apache.org/jira/browse/HIVE-7694
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java 
 0b7b1a3 
   ql/src/test/queries/clientpositive/sort_merge_join_desc_8.q PRE-CREATION 
   ql/src/test/results/clientpositive/sort_merge_join_desc_8.q.out 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/24630/diff/
 
 
 Testing
 ---
 
 sort_merge_join_desc_8.q added for testing the above cases
 
 
 Thanks,
 
 Suma Shivaprasad

[jira] [Updated] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

2014-09-09 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-8017:
-
Attachment: HIVE-8017.2-spark.patch

This patch fixes some failed qfile tests caused by last patch.
Two qtests are not fixed: {{optimize_nullscan.q}} and {{union_remove_25.q}}.
For {{optimize_nullscan.q}}  I checked the corresponding MR output and found 
the operator tree in the new output file is more similar to the one in the MR 
version output. Besides this failure is of age 2, so I guess it's not related 
to the patch here.
For {{union_remove_25.q}}, the only diff is the total size of {{outputTbl2}} 
(6812 - 6826). I checked the MR version and the total size is also 6812. I'm 
not sure what causes this difference. Maybe need to do more tests for 
partitioned table.
[~xuefuz] do you have any idea on this?

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8029:

Attachment: HIVE-8029.1-spark.patch

 Remove reducers number configure in SparkTask[Spark Branch]
 ---

 Key: HIVE-8029
 URL: https://issues.apache.org/jira/browse/HIVE-8029
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-8029.1-spark.patch


 We do not need duplicated logic to configure reducers number in SparkTask, as 
 SetSparkReduceParallelism would always set reducers number in compiler phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-8029:

Status: Patch Available  (was: Open)

 Remove reducers number configure in SparkTask[Spark Branch]
 ---

 Key: HIVE-8029
 URL: https://issues.apache.org/jira/browse/HIVE-8029
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-8029.1-spark.patch


 We do not need duplicated logic to configure reducers number in SparkTask, as 
 SetSparkReduceParallelism would always set reducers number in compiler phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7156) Group-By operator stat-annotation only uses distinct approx to generate rollups


[ 
https://issues.apache.org/jira/browse/HIVE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126795#comment-14126795
 ] 

Hive QA commented on HIVE-7156:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667341/HIVE-7156.1.patch

{color:red}ERROR:{color} -1 due to 178 failed/errored test(s), 6186 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join26
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer8

[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126833#comment-14126833
 ] 

Hive QA commented on HIVE-8017:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667378/HIVE-8017.2-spark.patch

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 6343 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_limit_pushdown
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_merge1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_merge2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/119/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/119/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-119/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667378

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails

2014-09-09 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated HIVE-7694:
---
Attachment: HIVE-7694.2.patch

 SMB join on tables differing by number of sorted by columns with same join 
 prefix fails
 ---

 Key: HIVE-7694
 URL: https://issues.apache.org/jira/browse/HIVE-7694
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch


 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, the following exception is seen
 {noformat}
 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
 1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7892) Thrift Set type not working with Hive

2014-09-09 Thread Satish Mittal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Mittal updated HIVE-7892:

Attachment: HIVE-7892.patch.txt

Attaching patch that resolves the issue.

The approach taken here is to essentially map thrift Set type to hive Array 
type (thrift List type already maps to hive Array). Since both List and Set are 
essentially collections, we can simply leverage the existing Array type, 
instead of exposing a new complex type at hive level.

 Thrift Set type not working with Hive
 -

 Key: HIVE-7892
 URL: https://issues.apache.org/jira/browse/HIVE-7892
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Satish Mittal
Assignee: Satish Mittal
 Attachments: HIVE-7892.patch.txt


 Thrift supports List, Map and Struct complex types, which get mapped to 
 Array, Map and Struct complex types in Hive respectively. However thrift Set 
 type doesn't seem to be working. 
 Here is an example thrift struct:
 {noformat}
 namespace java sample.thrift
 struct setrow {
 1: required seti32 ids,
 2: required string name,
 }
 {noformat}
 A Hive table is created with ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH 
 SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 
 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol').
 Describing the table shows:
 {noformat}
 hive describe settable; 
 OK
 ids   structfrom deserializer   
 namestringfrom deserializer
 {noformat}
 Issuing a select query on set column throws SemanticException:
 {noformat}
 hive select ids from settable;
 FAILED: SemanticException java.lang.IllegalArgumentException: Error: name 
 expected at the position 7 of 'struct' but '' is found.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 25473: Thrift Set type not working with Hive

2014-09-09 Thread Satish Mittal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25473/
---

Review request for hive, Amareshwari Sriramadasu, Ashutosh Chauhan, and Navis 
Ryu.


Bugs: HIVE-7892
https://issues.apache.org/jira/browse/HIVE-7892


Repository: hive-git


Description
---

Thrift supports List, Map and Struct complex types, which get mapped to Array, 
Map and Struct complex types in Hive respectively. However thrift Set type 
doesn't get mapped to any Hive type, and hence doesn't work with 
ThriftDeserializer serde.


Diffs
-

  serde/if/test/complex.thrift 308b64c 
  
serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde2/thrift/test/SetIntString.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
 9a226b3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardListObjectInspector.java
 6eb8803 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestThriftObjectInspectors.java
 5f692fb 

Diff: https://reviews.apache.org/r/25473/diff/


Testing
---

1) Added Unit test along with the fix.
2) Manually tested by creating a table with ThriftDeserializer serde and having 
thrift set columns:
   a) described the table
   b) issued query to select the set column


Thanks,

Satish Mittal

[jira] [Commented] (HIVE-7892) Thrift Set type not working with Hive

2014-09-09 Thread Satish Mittal (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126866#comment-14126866
 ] 

Satish Mittal commented on HIVE-7892:
-

Review: https://reviews.apache.org/r/25473/

 Thrift Set type not working with Hive
 -

 Key: HIVE-7892
 URL: https://issues.apache.org/jira/browse/HIVE-7892
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Satish Mittal
Assignee: Satish Mittal
 Attachments: HIVE-7892.patch.txt


 Thrift supports List, Map and Struct complex types, which get mapped to 
 Array, Map and Struct complex types in Hive respectively. However thrift Set 
 type doesn't seem to be working. 
 Here is an example thrift struct:
 {noformat}
 namespace java sample.thrift
 struct setrow {
 1: required seti32 ids,
 2: required string name,
 }
 {noformat}
 A Hive table is created with ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH 
 SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 
 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol').
 Describing the table shows:
 {noformat}
 hive describe settable; 
 OK
 ids   structfrom deserializer   
 namestringfrom deserializer
 {noformat}
 Issuing a select query on set column throws SemanticException:
 {noformat}
 hive select ids from settable;
 FAILED: SemanticException java.lang.IllegalArgumentException: Error: name 
 expected at the position 7 of 'struct' but '' is found.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7892) Thrift Set type not working with Hive

2014-09-09 Thread Satish Mittal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Mittal updated HIVE-7892:

Status: Patch Available  (was: Open)

 Thrift Set type not working with Hive
 -

 Key: HIVE-7892
 URL: https://issues.apache.org/jira/browse/HIVE-7892
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Satish Mittal
Assignee: Satish Mittal
 Attachments: HIVE-7892.patch.txt


 Thrift supports List, Map and Struct complex types, which get mapped to 
 Array, Map and Struct complex types in Hive respectively. However thrift Set 
 type doesn't seem to be working. 
 Here is an example thrift struct:
 {noformat}
 namespace java sample.thrift
 struct setrow {
 1: required seti32 ids,
 2: required string name,
 }
 {noformat}
 A Hive table is created with ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH 
 SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 
 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol').
 Describing the table shows:
 {noformat}
 hive describe settable; 
 OK
 ids   structfrom deserializer   
 namestringfrom deserializer
 {noformat}
 Issuing a select query on set column throws SemanticException:
 {noformat}
 hive select ids from settable;
 FAILED: SemanticException java.lang.IllegalArgumentException: Error: name 
 expected at the position 7 of 'struct' but '' is found.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126899#comment-14126899
 ] 

Hive QA commented on HIVE-8029:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667379/HIVE-8029.1-spark.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 6343 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/120/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/120/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-120/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667379

 Remove reducers number configure in SparkTask[Spark Branch]
 ---

 Key: HIVE-8029
 URL: https://issues.apache.org/jira/browse/HIVE-8029
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-8029.1-spark.patch


 We do not need duplicated logic to configure reducers number in SparkTask, as 
 SetSparkReduceParallelism would always set reducers number in compiler phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8030) NullPointerException on getSchemas

2014-09-09 Thread Shiv Prakash (JIRA)

Shiv Prakash created HIVE-8030:
--

 Summary: NullPointerException on getSchemas
 Key: HIVE-8030
 URL: https://issues.apache.org/jira/browse/HIVE-8030
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, JDBC
Affects Versions: 0.13.1
 Environment: Linux (Ubuntu 12.04)
Reporter: Shiv Prakash
 Fix For: 0.13.1


java.lang.NullPointerException
at java.util.ArrayList.init(ArrayList.java:151)
at 
org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.init(HiveMetaDataResultSet.java:32)
at 
org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.init(HiveDatabaseMetaData.java:482)
at 
org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:481)
at 
org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:476)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at 
org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$DatabaseMetaDataInvocationHandler.invoke(DriverProxyInvocationChain.java:368)
at com.sun.proxy.$Proxy20.getSchemas(Unknown Source)
at org.pentaho.di.core.database.Database.getSchemas(Database.java:3857)
at 
org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.getSchemaNames(TableOutputDialog.java:1036)
at 
org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.access$2400(TableOutputDialog.java:94)
at 
org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog$24.widgetSelected(TableOutputDialog.java:863)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at 
org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.open(TableOutputDialog.java:884)
at 
org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:124)
at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8648)
at 
org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3020)
at 
org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:737)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1297)
at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7801)
at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9130)
at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:638)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.pentaho.commons.launcher.Launcher.main(Launcher.java:151)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk


[ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126921#comment-14126921
 ] 

Hive QA commented on HIVE-7946:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667347/HIVE-7946.5.patch

{color:red}ERROR:{color} -1 due to 261 failed/errored test(s), 5557 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver_accumulo_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_analyze_table_null_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_explain
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_nulls
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_partitioned_native
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_char_cast
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constprog2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_count
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_func1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_genericudaf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_union_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_distinct_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fetch_aggregation
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_filter_join_breaktask2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby2_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_distinct_samekey
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_resolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_having2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_file_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_multiple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join

[jira] [Commented] (HIVE-8030) NullPointerException on getSchemas

2014-09-09 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126924#comment-14126924
 ] 

Lars Francke commented on HIVE-8030:


This looks very similar to HIVE-8030. I'm looking at the current code and all 
the ArrayList creations are guarded by null checks. Are you sure that you are 
using Hive 0.13.1? The line numbers don't seem to match up either.

 NullPointerException on getSchemas
 --

 Key: HIVE-8030
 URL: https://issues.apache.org/jira/browse/HIVE-8030
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema, JDBC
Affects Versions: 0.13.1
 Environment: Linux (Ubuntu 12.04)
Reporter: Shiv Prakash
  Labels: hadoop
 Fix For: 0.13.1


 java.lang.NullPointerException
   at java.util.ArrayList.init(ArrayList.java:151)
   at 
 org.apache.hadoop.hive.jdbc.HiveMetaDataResultSet.init(HiveMetaDataResultSet.java:32)
   at 
 org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData$3.init(HiveDatabaseMetaData.java:482)
   at 
 org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:481)
   at 
 org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.getSchemas(HiveDatabaseMetaData.java:476)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:622)
   at 
 org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$DatabaseMetaDataInvocationHandler.invoke(DriverProxyInvocationChain.java:368)
   at com.sun.proxy.$Proxy20.getSchemas(Unknown Source)
   at org.pentaho.di.core.database.Database.getSchemas(Database.java:3857)
   at 
 org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.getSchemaNames(TableOutputDialog.java:1036)
   at 
 org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.access$2400(TableOutputDialog.java:94)
   at 
 org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog$24.widgetSelected(TableOutputDialog.java:863)
   at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
   at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
   at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
   at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
   at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
   at 
 org.pentaho.di.ui.trans.steps.tableoutput.TableOutputDialog.open(TableOutputDialog.java:884)
   at 
 org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:124)
   at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8648)
   at 
 org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3020)
   at 
 org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:737)
   at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
   at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
   at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
   at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
   at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
   at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1297)
   at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7801)
   at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9130)
   at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:638)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:622)
   at org.pentaho.commons.launcher.Launcher.main(Launcher.java:151)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-7776) enable sample10.q.[Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li reassigned HIVE-7776:
---

Assignee: Chengxiang Li

 enable sample10.q.[Spark Branch]
 

 Key: HIVE-7776
 URL: https://issues.apache.org/jira/browse/HIVE-7776
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li

 sample10.q contain dynamic partition operation, should enable this qtest 
 after hive on spark support dynamic partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7704) Create tez task for fast file merging


[ 
https://issues.apache.org/jira/browse/HIVE-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126985#comment-14126985
 ] 

Hive QA commented on HIVE-7704:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667365/HIVE-7704.8.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6193 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/707/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/707/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-707/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667365

 Create tez task for fast file merging
 -

 Key: HIVE-7704
 URL: https://issues.apache.org/jira/browse/HIVE-7704
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7704.1.patch, HIVE-7704.2.patch, HIVE-7704.3.patch, 
 HIVE-7704.4.patch, HIVE-7704.4.patch, HIVE-7704.5.patch, HIVE-7704.6.patch, 
 HIVE-7704.7.patch, HIVE-7704.8.patch


 Currently tez falls back to MR task for merge file task. It will beneficial 
 to convert the merge file tasks to tez task to make use of the performance 
 gains from tez. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-7834) Use min, max and NDV from the stats to better estimate many to many vs one to many inner joins


 [ 
https://issues.apache.org/jira/browse/HIVE-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar resolved HIVE-7834.
---
Resolution: Duplicate

 Use min, max and NDV from the stats to better estimate many to many vs one to 
 many inner joins
 --

 Key: HIVE-7834
 URL: https://issues.apache.org/jira/browse/HIVE-7834
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
 Fix For: 0.14.0


 I noticed that the estimate number of rows in Map joins is higher after the 
 join than before the join that is with column stats fetch ON or OFF.
 TPC-DS Q55 was a good example for that, the issue is that the current 
 statistics provide us enough information that we can estimate with strong 
 confidence that the joins are one to many and not many to many.
 Joining store_sales x item on ss_item_sk = i_item_sk, we know that the NDV, 
 min and max values for both join columns match while the row counts are 
 different this pattern indicates a PK/FK relationship between store_sales and 
 item.
 Yet when a filter is applied on item and reduces the number of rows from 462K 
 to 7K we estimate a many to many join between the filtered item and 
 store_sales and as a result the estimate number of rows coming out of the 
 join is off by several orders of magnitude.
 Available information from the stats
 {code}
 Table Join column NDV from describe   NDV actual  
 min max
 item  i_item_sk   439,501 462,000 
 1   462,000
 date_dim  d_date_sk   65,332  73,049  
 2,415,022   2,488,070
 store_sales   ss_item_sk  439,501 462,000 
 1   462,000
 store_sales   ss_sold_date_sk 2,226   1,823   
 2,450,816   2,452,642
 {code}
 Same thing applies to store_sales and date_dim but with a caveat that the NDV 
 , min and max values don't match where date_dim has a bigger domain and 
 accordingly a higher NDV count.
 For joining store_sales and item on on ss_item_sk = i_item_sk since both 
 columns have the same NDV, min and max values we can safely conclude that 
 selectivity on item will translate to similar selectivity on store_sales.
 This is not the case for joining store_sales and date_dim on ss_sold_date_sk 
 = d_date_sk since the domain of d_date_sk is much bigger than that of 
 ss_sold_date_sk, differences in domain need to be taken into account when 
 inferring selectivity onto store_sales.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.

Mostafa Mokhtar created HIVE-8031:
-

 Summary: CBO should use per column join selectivity not NDV when 
applying exponential backoff.
 Key: HIVE-8031
 URL: https://issues.apache.org/jira/browse/HIVE-8031
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Laljo John Pullokkaran
 Fix For: 0.14.0


Simplify predicates for disjunctive predicates so that can get pushed down to 
the scan.

For TPC-DS query 13 we push down predicates in the following form 

where c_martial_status in ('M','D','U') etc.. 

{code}
select avg(ss_quantity)
   ,avg(ss_ext_sales_price)
   ,avg(ss_ext_wholesale_cost)
   ,sum(ss_ext_wholesale_cost)
 from store_sales
 ,store
 ,customer_demographics
 ,household_demographics
 ,customer_address
 ,date_dim
 where store.s_store_sk = store_sales.ss_store_sk
 and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
2001
 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'M'
  and customer_demographics.cd_education_status = '4 yr Degree'
  and store_sales.ss_sales_price between 100.00 and 150.00
  and household_demographics.hd_dep_count = 3   
 )or
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'D'
  and customer_demographics.cd_education_status = 'Primary'
  and store_sales.ss_sales_price between 50.00 and 100.00   
  and household_demographics.hd_dep_count = 1
 ) or 
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'U'
  and customer_demographics.cd_education_status = 'Advanced Degree'
  and store_sales.ss_sales_price between 150.00 and 200.00 
  and household_demographics.hd_dep_count = 1  
 ))
 and((store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('KY', 'GA', 'NM')
  and store_sales.ss_net_profit between 100 and 200  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('MT', 'OR', 'IN')
  and store_sales.ss_net_profit between 150 and 300  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('WI', 'MO', 'WV')
  and store_sales.ss_net_profit between 50 and 250  
 ))
;

{code}


This is the plan currently generated without any predicate simplification 
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 7 - Map 8 (BROADCAST_EDGE)
Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 
(SIMPLE_EDGE)
Reducer 3 - Reducer 2 (SIMPLE_EDGE)
  DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: customer_address
  Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
  Select Operator
expressions: ca_address_sk (type: int), ca_state (type: 
string), ca_country (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 4000 Data size: 40595195284 
Basic stats: COMPLETE Column stats: NONE
  value expressions: _col0 (type: int), _col1 (type: 
string), _col2 (type: string)
Execution mode: vectorized
Map 4 
Map Operator Tree:
TableScan
  alias: date_dim
  filterExpr: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: d_date_sk (type: int)
  outputColumnNames: _col0

[jira] [Updated] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.


 [ 
https://issues.apache.org/jira/browse/HIVE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8031:
--
Assignee: Harish Butani  (was: Laljo John Pullokkaran)

 CBO should use per column join selectivity not NDV when applying exponential 
 backoff.
 -

 Key: HIVE-8031
 URL: https://issues.apache.org/jira/browse/HIVE-8031
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0, 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
 Fix For: 0.14.0


 Simplify predicates for disjunctive predicates so that can get pushed down to 
 the scan.
 For TPC-DS query 13 we push down predicates in the following form 
 where c_martial_status in ('M','D','U') etc.. 
 {code}
 select avg(ss_quantity)
,avg(ss_ext_sales_price)
,avg(ss_ext_wholesale_cost)
,sum(ss_ext_wholesale_cost)
  from store_sales
  ,store
  ,customer_demographics
  ,household_demographics
  ,customer_address
  ,date_dim
  where store.s_store_sk = store_sales.ss_store_sk
  and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
 2001
  and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
   and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
   and customer_demographics.cd_marital_status = 'M'
   and customer_demographics.cd_education_status = '4 yr Degree'
   and store_sales.ss_sales_price between 100.00 and 150.00
   and household_demographics.hd_dep_count = 3   
  )or
  (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
   and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
   and customer_demographics.cd_marital_status = 'D'
   and customer_demographics.cd_education_status = 'Primary'
   and store_sales.ss_sales_price between 50.00 and 100.00   
   and household_demographics.hd_dep_count = 1
  ) or 
  (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
   and customer_demographics.cd_demo_sk = ss_cdemo_sk
   and customer_demographics.cd_marital_status = 'U'
   and customer_demographics.cd_education_status = 'Advanced Degree'
   and store_sales.ss_sales_price between 150.00 and 200.00 
   and household_demographics.hd_dep_count = 1  
  ))
  and((store_sales.ss_addr_sk = customer_address.ca_address_sk
   and customer_address.ca_country = 'United States'
   and customer_address.ca_state in ('KY', 'GA', 'NM')
   and store_sales.ss_net_profit between 100 and 200  
  ) or
  (store_sales.ss_addr_sk = customer_address.ca_address_sk
   and customer_address.ca_country = 'United States'
   and customer_address.ca_state in ('MT', 'OR', 'IN')
   and store_sales.ss_net_profit between 150 and 300  
  ) or
  (store_sales.ss_addr_sk = customer_address.ca_address_sk
   and customer_address.ca_country = 'United States'
   and customer_address.ca_state in ('WI', 'MO', 'WV')
   and store_sales.ss_net_profit between 50 and 250  
  ))
 ;
 {code}
 This is the plan currently generated without any predicate simplification 
 {code}
 STAGE DEPENDENCIES:
   Stage-1 is a root stage
   Stage-0 depends on stages: Stage-1
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 7 - Map 8 (BROADCAST_EDGE)
 Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
 Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 
 (SIMPLE_EDGE)
 Reducer 3 - Reducer 2 (SIMPLE_EDGE)
   DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: customer_address
   Statistics: Num rows: 4000 Data size: 40595195284 Basic 
 stats: COMPLETE Column stats: NONE
   Select Operator
 expressions: ca_address_sk (type: int), ca_state (type: 
 string), ca_country (type: string)
 outputColumnNames: _col0, _col1, _col2
 Statistics: Num rows: 4000 Data size: 40595195284 
 Basic stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   sort order: 
   Statistics: Num rows: 4000 Data size: 40595195284 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col1 (type: 
 string), _col2 (type: string)
 Execution mode: vectorized
 Map 4 
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: ((d_year = 2001) and d_date_sk is not null) 
 (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: NONE
   Filter Operator

[jira] [Updated] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.


 [ 
https://issues.apache.org/jira/browse/HIVE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8031:
--
Description: (was: Simplify predicates for disjunctive predicates so 
that can get pushed down to the scan.

For TPC-DS query 13 we push down predicates in the following form 

where c_martial_status in ('M','D','U') etc.. 

{code}
select avg(ss_quantity)
   ,avg(ss_ext_sales_price)
   ,avg(ss_ext_wholesale_cost)
   ,sum(ss_ext_wholesale_cost)
 from store_sales
 ,store
 ,customer_demographics
 ,household_demographics
 ,customer_address
 ,date_dim
 where store.s_store_sk = store_sales.ss_store_sk
 and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 
2001
 and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'M'
  and customer_demographics.cd_education_status = '4 yr Degree'
  and store_sales.ss_sales_price between 100.00 and 150.00
  and household_demographics.hd_dep_count = 3   
 )or
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'D'
  and customer_demographics.cd_education_status = 'Primary'
  and store_sales.ss_sales_price between 50.00 and 100.00   
  and household_demographics.hd_dep_count = 1
 ) or 
 (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
  and customer_demographics.cd_demo_sk = ss_cdemo_sk
  and customer_demographics.cd_marital_status = 'U'
  and customer_demographics.cd_education_status = 'Advanced Degree'
  and store_sales.ss_sales_price between 150.00 and 200.00 
  and household_demographics.hd_dep_count = 1  
 ))
 and((store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('KY', 'GA', 'NM')
  and store_sales.ss_net_profit between 100 and 200  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('MT', 'OR', 'IN')
  and store_sales.ss_net_profit between 150 and 300  
 ) or
 (store_sales.ss_addr_sk = customer_address.ca_address_sk
  and customer_address.ca_country = 'United States'
  and customer_address.ca_state in ('WI', 'MO', 'WV')
  and store_sales.ss_net_profit between 50 and 250  
 ))
;

{code}


This is the plan currently generated without any predicate simplification 
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 7 - Map 8 (BROADCAST_EDGE)
Map 8 - Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
Reducer 2 - Map 1 (SIMPLE_EDGE), Map 4 (BROADCAST_EDGE), Map 7 
(SIMPLE_EDGE)
Reducer 3 - Reducer 2 (SIMPLE_EDGE)
  DagName: mmokhtar_20140828155050_7059c24b-501b-4683-86c0-4f3c023f0b0e:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: customer_address
  Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
  Select Operator
expressions: ca_address_sk (type: int), ca_state (type: 
string), ca_country (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 4000 Data size: 40595195284 Basic 
stats: COMPLETE Column stats: NONE
Reduce Output Operator
  sort order: 
  Statistics: Num rows: 4000 Data size: 40595195284 
Basic stats: COMPLETE Column stats: NONE
  value expressions: _col0 (type: int), _col1 (type: 
string), _col2 (type: string)
Execution mode: vectorized
Map 4 
Map Operator Tree:
TableScan
  alias: date_dim
  filterExpr: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: ((d_year = 2001) and d_date_sk is not null) 
(type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: d_date_sk (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +

[jira] [Updated] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.


 [ 
https://issues.apache.org/jira/browse/HIVE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8031:
--
Description: 
Currently CBO uses NDV not join selectivity in computeInnerJoinSelectivity 
which results in in-accurate estimate number of rows.

I looked at the plan for TPC-DS Q17 after the latest set of changes and I am 
concerned that the estimate of rows for the join of store_sales and 
store_returns is so low, as you can see the estimate is 8461 rows for joining 
1.2795706667449066E8 with 1.2922108035889767E7.

{code}
HiveJoinRel(condition=[AND(=($130, $3), =($129, $15))], joinType=[inner]): 
rowcount = 1079.1345153548855, cumulative cost = {8.271845957931738E10 rows, 
0.0 cpu, 0.0 io}, id = 517
  HiveJoinRel(condition=[=($0, $38)], joinType=[inner]): 
rowcount = 6.669190301841249E7, cumulative cost = {4.300510912631623E10 rows, 
0.0 cpu, 0.0 io}, id = 402
HiveTableScanRel(table=[[catalog_sales]]): rowcount = 
4.3005109025E10, cumulative cost = {0}, id = 2
HiveFilterRel(condition=[in($15, '2000Q1', '2000Q2', 
'2000Q3')]): rowcount = 101.31622746185853, cumulative cost = {0.0 rows, 0.0 
cpu, 0.0 io}, id = 181
  HiveTableScanRel(table=[[d3]]): rowcount = 73049.0, 
cumulative cost = {0}, id = 3
  HiveJoinRel(condition=[AND(AND(=($3, $61), =($2, $60)), =($9, 
$67))], joinType=[inner]): rowcount = 8461.27236667537, cumulative cost = 
{8.26517592150266E10 rows, 0.0 cpu, 0.0 io}, id = 515
HiveJoinRel(condition=[=($27, $0)], joinType=[inner]): 
rowcount = 1.2795706667449066E8, cumulative cost = {8.251088004031622E10 rows, 
0.0 cpu, 0.0 io}, id = 417
  HiveTableScanRel(table=[[store_sales]]): rowcount = 
8.2510879939E10, cumulative cost = {0}, id = 5
  HiveFilterRel(condition=[=($15, '2000Q1')]): rowcount = 
101.31622746185853, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 173
HiveTableScanRel(table=[[d1]]): rowcount = 73049.0, 
cumulative cost = {0}, id = 0
HiveJoinRel(condition=[=($0, $24)], joinType=[inner]): 
rowcount = 1.2922108035889767E7, cumulative cost = {8.332595810316228E9 rows, 
0.0 cpu, 0.0 io}, id = 424
  HiveTableScanRel(table=[[store_returns]]): rowcount = 
8.332595709E9, cumulative cost = {0}, id = 7
  HiveFilterRel(condition=[in($15, '2000Q1', '2000Q2', 
'2000Q3')]): rowcount = 101.31622746185853, cumulative cost = {0.0 rows, 0.0 
cpu, 0.0 io}, id = 177
HiveTableScanRel(table=[[d2]]): rowcount = 73049.0, 
cumulative cost = {0}, id = 1
{code}

 CBO should use per column join selectivity not NDV when applying exponential 
 backoff.
 -

 Key: HIVE-8031
 URL: https://issues.apache.org/jira/browse/HIVE-8031
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0, 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
 Fix For: 0.14.0


 Currently CBO uses NDV not join selectivity in computeInnerJoinSelectivity 
 which results in in-accurate estimate number of rows.
 I looked at the plan for TPC-DS Q17 after the latest set of changes and I am 
 concerned that the estimate of rows for the join of store_sales and 
 store_returns is so low, as you can see the estimate is 8461 rows for joining 
 1.2795706667449066E8 with 1.2922108035889767E7.
 {code}
 HiveJoinRel(condition=[AND(=($130, $3), =($129, $15))], 
 joinType=[inner]): rowcount = 1079.1345153548855, cumulative cost = 
 {8.271845957931738E10 rows, 0.0 cpu, 0.0 io}, id = 517
   HiveJoinRel(condition=[=($0, $38)], joinType=[inner]): 
 rowcount = 6.669190301841249E7, cumulative cost = {4.300510912631623E10 rows, 
 0.0 cpu, 0.0 io}, id = 402
 HiveTableScanRel(table=[[catalog_sales]]): rowcount = 
 4.3005109025E10, cumulative cost = {0}, id = 2
 HiveFilterRel(condition=[in($15, '2000Q1', '2000Q2', 
 '2000Q3')]): rowcount = 101.31622746185853, cumulative cost = {0.0 rows, 0.0 
 cpu, 0.0 io}, id = 181
   HiveTableScanRel(table=[[d3]]): rowcount = 73049.0, 
 cumulative cost = {0}, id = 3
   HiveJoinRel(condition=[AND(AND(=($3, $61), =($2, $60)), 
 =($9, $67))], joinType=[inner]): rowcount = 8461.27236667537, cumulative cost 
 = {8.26517592150266E10 rows, 0.0 cpu, 0.0 io}, id = 515
 HiveJoinRel(condition=[=($27, $0)], joinType=[inner]): 
 rowcount = 1.2795706667449066E8, cumulative cost = {8.251088004031622E10 
 rows, 0.0 cpu, 0.0 io}, id = 417
   HiveTableScanRel(table=[[store_sales]]): rowcount = 
 8.2510879939E10, cumulative cost = {0}, id = 5

[jira] [Updated] (HIVE-8031) CBO should use per column join selectivity not NDV when applying exponential backoff.


 [ 
https://issues.apache.org/jira/browse/HIVE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-8031:
--
Affects Version/s: 0.14.0

 CBO should use per column join selectivity not NDV when applying exponential 
 backoff.
 -

 Key: HIVE-8031
 URL: https://issues.apache.org/jira/browse/HIVE-8031
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0, 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Harish Butani
 Fix For: 0.14.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127051#comment-14127051
 ] 

Brock Noland commented on HIVE-8029:


+1

 Remove reducers number configure in SparkTask[Spark Branch]
 ---

 Key: HIVE-8029
 URL: https://issues.apache.org/jira/browse/HIVE-8029
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-8029.1-spark.patch


 We do not need duplicated logic to configure reducers number in SparkTask, as 
 SetSparkReduceParallelism would always set reducers number in compiler phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8032) Fix TestSparkCliDriver = optimize_nullscan.q

Brock Noland created HIVE-8032:
--

 Summary: Fix TestSparkCliDriver = optimize_nullscan.q
 Key: HIVE-8032
 URL: https://issues.apache.org/jira/browse/HIVE-8032
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8032) Fix TestSparkCliDriver = optimize_nullscan.q


 [ 
https://issues.apache.org/jira/browse/HIVE-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8032:
---
Description: It's been failing lately, perhaps since the last merge from 
trunk.

 Fix TestSparkCliDriver = optimize_nullscan.q
 -

 Key: HIVE-8032
 URL: https://issues.apache.org/jira/browse/HIVE-8032
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland

 It's been failing lately, perhaps since the last merge from trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127054#comment-14127054
 ] 

Brock Noland commented on HIVE-8017:


I don't think nullscan is related as it's been failing for other runs. I 
created HIVE-8032 to fix that.

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8029) Remove reducers number configure in SparkTask[Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127055#comment-14127055
 ] 

Brock Noland commented on HIVE-8029:


I don't think nullscan is related as it's been failing for other runs. I 
created HIVE-8032 to fix that.


 Remove reducers number configure in SparkTask[Spark Branch]
 ---

 Key: HIVE-8029
 URL: https://issues.apache.org/jira/browse/HIVE-8029
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-8029.1-spark.patch


 We do not need duplicated logic to configure reducers number in SparkTask, as 
 SetSparkReduceParallelism would always set reducers number in compiler phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8029:
---
Summary: Remove reducers number configure in SparkTask [Spark Branch]  
(was: Remove reducers number configure in SparkTask[Spark Branch])

 Remove reducers number configure in SparkTask [Spark Branch]
 

 Key: HIVE-8029
 URL: https://issues.apache.org/jira/browse/HIVE-8029
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Attachments: HIVE-8029.1-spark.patch


 We do not need duplicated logic to configure reducers number in SparkTask, as 
 SetSparkReduceParallelism would always set reducers number in compiler phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8029) Remove reducers number configure in SparkTask [Spark Branch]


 [ 
https://issues.apache.org/jira/browse/HIVE-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8029:
---
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Thank you Chengxiang! I have committed this to spark. Note in the commit I 
actually said Rui since I was just reviewing HIVE-8017. I apologize for this 
mistake, but since the JIRA is assigned to you, you will still get the 
appropriate accreditation for the patch.

 Remove reducers number configure in SparkTask [Spark Branch]
 

 Key: HIVE-8029
 URL: https://issues.apache.org/jira/browse/HIVE-8029
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M4
 Fix For: spark-branch

 Attachments: HIVE-8029.1-spark.patch


 We do not need duplicated logic to configure reducers number in SparkTask, as 
 SetSparkReduceParallelism would always set reducers number in compiler phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8023) Code in HIVE-6380 eats exceptions


 [ 
https://issues.apache.org/jira/browse/HIVE-8023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8023:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
 Assignee: Jason Dere
   Status: Resolved  (was: Patch Available)

Thank you Jason! I have committed this to trunk.


 Code in HIVE-6380 eats exceptions
 -

 Key: HIVE-8023
 URL: https://issues.apache.org/jira/browse/HIVE-8023
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Jason Dere
 Fix For: 0.14.0

 Attachments: HIVE-8023.1.patch


 This code eats the stack trace
 {noformat}
   LOG.error(Unable to load resources for  + dbName + . + fName + 
 : + e);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8012) TestHiveServer2Concurrency is not implemented


 [ 
https://issues.apache.org/jira/browse/HIVE-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8012:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you Jason! I have committed to trunk.

 TestHiveServer2Concurrency is not implemented
 -

 Key: HIVE-8012
 URL: https://issues.apache.org/jira/browse/HIVE-8012
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 0.14.0

 Attachments: HIVE-8012.1.patch


 {code}
   @Test
   public void test() {
 fail(Not yet implemented);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-09 Thread Brock Noland


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52723
---


Great work!


serde/pom.xml
https://reviews.apache.org/r/25468/#comment91700

These should only be indented by two spaces, not four. Have you tried 
submitting an MR job on a cluster with this patch? The reason I ask is that I 
think the serde must be in here:

https://github.com/apache/hive/blob/trunk/ql/pom.xml#L563

for it to be available to MR jobs.



serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java
https://reviews.apache.org/r/25468/#comment91701

I think we should call this OpenCSVSerde since it's based on OpenCSV and I 
believe we might see multiple implementations of CSVSerde.

I think we should extend AbstractSerde as that is what all the new Serdes 
are supposed to be doing.


- Brock Noland


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu

[jira] [Comment Edited] (HIVE-5871) Use multiple-characters as field delimiter


[ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127076#comment-14127076
 ] 

Brock Noland edited comment on HIVE-5871 at 9/9/14 3:11 PM:


I think for Hive, you need committer privs to get the ability to edit 
comments. Let me see if we can relax this.


was (Author: brocknoland):
I think for Hive, you need committer privs to get the ability to edit 
comments. Let me see if we can relax this.

TEST EDIT.

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5871) Use multiple-characters as field delimiter


[ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127076#comment-14127076
 ] 

Brock Noland commented on HIVE-5871:


I think for Hive, you need committer privs to get the ability to edit 
comments. Let me see if we can relax this.

TEST EDIT.

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end


[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127079#comment-14127079
 ] 

Hive QA commented on HIVE-7689:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667375/HIVE-7689.6.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 6185 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDDLExclusive
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDDLShared
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testDelete
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testJoin
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testReadWrite
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testRollback
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleReadMultiPartition
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleReadPartition
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleReadTable
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleWritePartition
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testSingleWriteTable
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testUpdate
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/708/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/708/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-708/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667375

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7889.1.patch, 
 HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on postgres 
 metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127087#comment-14127087
 ] 

Brock Noland commented on HIVE-8017:


bq. do you have any comments as how to use the {{-- SORT_BEFORE_DIFF}} label?

I am surprised the query failed if that was before the the set commands? I 
think the big item is to ensure that the comment flag {{--}} is before the text 
{{SORT_BEFORE_DIFF}}. The code which implements this is here: 
https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java#L416

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6147) Support avro data stored in HBase columns


 [ 
https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6147:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you Swarnim! I have committed this to trunk!

 Support avro data stored in HBase columns
 -

 Key: HIVE-6147
 URL: https://issues.apache.org/jira/browse/HIVE-6147
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Affects Versions: 0.12.0, 0.13.0
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Fix For: 0.14.0

 Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, 
 HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, 
 HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt


 Presently, the HBase Hive integration supports querying only primitive data 
 types in columns. It would be nice to be able to store and query Avro objects 
 in HBase columns by making them visible as structs to Hive. This will allow 
 Hive to perform ad hoc analysis of HBase data which can be deeply structured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8033) StorageBasedAuthorizationProvider too restrictive on insert/select

2014-09-09 Thread Alan Gates (JIRA)

Alan Gates created HIVE-8033:


 Summary: StorageBasedAuthorizationProvider too restrictive on 
insert/select
 Key: HIVE-8033
 URL: https://issues.apache.org/jira/browse/HIVE-8033
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 0.13.1
Reporter: Alan Gates


When doing
{code}
insert into table foo select * from bar
{code}
StorageBasedAuth checks that the user has write permissions on bar.  It only 
needs read permission on bar for this operation.

To reproduce:
# As user1, create a table bar with file permissions set to world readable but 
not writable.
# As user2, create table foo.
# Confirm that user2 can read from bar: select count(*) from bar;
# As user2: insert into foo select * from bar;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

[
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127157#comment-14127157
]

Xuefu Zhang commented on HIVE-8017:
---

Hi Rui,

{quote}
Have you tried – SORT_BEFORE_DIFF, which is to sort the query result, which is
different from SORT_BEFORE_DIFF. It's supposed to make query result sorted w/o
an order by.
{quote}

This comment indeeded made no sense. Sorry for the confusion, I meant --
SORT_QUERY_RESULTS for the first -- SORT_BEFORE_DIFF. So my comment should
really go like this:

{bold}
Have you tried – SORT_QUERY_RESULTS, which is to sort the query result, which
is different from SORT_BEFORE_DIFF. It's supposed to make query result sorted
w/o an order by.
{bold}

SORT_BEFORE_DIFF sorts the output files before making diff. It's less
reliable, and sometimes the diff doesn't tell what's wrong. Thus, I think we
should prefer SORT_QUERY_RESULTS when query output can diff in order.

Neither do I know why SORT_BEFORE_DIFF has to come after set commands in your
case, but it seems the usage of it in the .q files is not consistent on that
front.

Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark
Branch]
---

Key: HIVE-8017
URL: https://issues.apache.org/jira/browse/HIVE-8017
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch

HiveKey should be used as the key type because it holds the hash code for
partitioning. While BytesWritable serves partitioning well for simple cases,
we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join,
bucketed table, etc.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]

[
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127157#comment-14127157
]

Xuefu Zhang edited comment on HIVE-8017 at 9/9/14 4:17 PM:
---

Hi Rui,

{quote}
Have you try – SORT_BEFORE_DIFF, which is to sort the query result, which is
different from SORT_BEFORE_DIFF. It's supposed to make query result sorted w/o
an order by.
{quote}

This comment indeeded made no sense. Sorry for the confusion, I meant --
SORT_QUERY_RESULTS for the first -- SORT_BEFORE_DIFF. So my comment should
really go like this:

{quote}
Have you tried – SORT_QUERY_RESULTS, which is to sort the query result, which
is different from SORT_BEFORE_DIFF. It's supposed to make query result sorted
w/o an order by.
{quote}

Neither do I know why SORT_BEFORE_DIFF has to come after set commands in your
case, but it seems the usage of it in the .q files is not consistent on that
front.

was (Author: xuefuz):
Hi Rui,

{quote}
Have you tried – SORT_BEFORE_DIFF, which is to sort the query result, which is
different from SORT_BEFORE_DIFF. It's supposed to make query result sorted w/o
an order by.
{quote}

This comment indeeded made no sense. Sorry for the confusion, I meant --
SORT_QUERY_RESULTS for the first -- SORT_BEFORE_DIFF. So my comment should
really go like this:

{bold}
Have you tried – SORT_QUERY_RESULTS, which is to sort the query result, which
is different from SORT_BEFORE_DIFF. It's supposed to make query result sorted
w/o an order by.
{bold}

Neither do I know why SORT_BEFORE_DIFF has to come after set commands in your
case, but it seems the usage of it in the .q files is not consistent on that
front.

Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark
Branch]
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8017) Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]


[ 
https://issues.apache.org/jira/browse/HIVE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127184#comment-14127184
 ] 

Xuefu Zhang commented on HIVE-8017:
---

{quote}
For union_remove_25.q, the only diff is the total size of outputTbl2 (6812 - 
6826). I checked the MR version and the total size is also 6812. I'm not sure 
what causes this difference.
{quote}

I have no clue either, but I had the same observation before. Nevertheless, 
this should be okay since the output data matches. It doesn't seem worth the 
time to drill down, at least for now.

 Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark 
 Branch]
 ---

 Key: HIVE-8017
 URL: https://issues.apache.org/jira/browse/HIVE-8017
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-8017-spark.patch, HIVE-8017.2-spark.patch


 HiveKey should be used as the key type because it holds the hash code for 
 partitioning. While BytesWritable serves partitioning well for simple cases, 
 we have to use {{HiveKey.hashCode}} for more complicated ones, e.g. join, 
 bucketed table, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5545) HCatRecord getInteger method returns String when used on Partition columns of type INT

2014-09-09 Thread Rishav Rohit (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127198#comment-14127198
 ] 

Rishav Rohit commented on HIVE-5545:


[~eugene.koifman] Attached is the error stack-
{quote}
13/10/11 21:06:03 INFO mapred.JobClient: Task Id : 
attempt_201310112040_0005_m_00_2, Status : FAILED
java.lang.ClassCastException: java.lang.String cannot be cast to 
java.lang.Integer
at org.apache.hcatalog.data.HCatRecord.getInteger(HCatRecord.java:84)
at com.test.hcatalog.testMapper.map(testMapper.java:25)
at com.test.hcatalog.testMapper.map(testMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
{quote}

 HCatRecord getInteger method returns String when used on Partition columns of 
 type INT
 --

 Key: HIVE-5545
 URL: https://issues.apache.org/jira/browse/HIVE-5545
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
 Environment: hadoop-1.0.3
Reporter: Rishav Rohit

 HCatRecord getInteger method returns String when used on Partition columns of 
 type INT.
 java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.lang.Integer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7892) Thrift Set type not working with Hive


[ 
https://issues.apache.org/jira/browse/HIVE-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127199#comment-14127199
 ] 

Hive QA commented on HIVE-7892:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667390/HIVE-7892.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6186 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_convert_enum_to_string
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/709/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/709/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-709/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667390

 Thrift Set type not working with Hive
 -

 Key: HIVE-7892
 URL: https://issues.apache.org/jira/browse/HIVE-7892
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Satish Mittal
Assignee: Satish Mittal
 Attachments: HIVE-7892.patch.txt


 Thrift supports List, Map and Struct complex types, which get mapped to 
 Array, Map and Struct complex types in Hive respectively. However thrift Set 
 type doesn't seem to be working. 
 Here is an example thrift struct:
 {noformat}
 namespace java sample.thrift
 struct setrow {
 1: required seti32 ids,
 2: required string name,
 }
 {noformat}
 A Hive table is created with ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH 
 SERDEPROPERTIES ('serialization.class'='sample.thrift.setrow', 
 'serialization.format'='org.apache.thrift.protocol.TBinaryProtocol').
 Describing the table shows:
 {noformat}
 hive describe settable; 
 OK
 ids   structfrom deserializer   
 namestringfrom deserializer
 {noformat}
 Issuing a select query on set column throws SemanticException:
 {noformat}
 hive select ids from settable;
 FAILED: SemanticException java.lang.IllegalArgumentException: Error: name 
 expected at the position 7 of 'struct' but '' is found.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5545) HCatRecord getInteger method returns String when used on Partition columns of type INT

2014-09-09 Thread Rishav Rohit (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127202#comment-14127202
 ] 

Rishav Rohit commented on HIVE-5545:


testMapper.java:25 is
{quote}
Integer year = new Integer(value.getInteger(year, schema));
{quote}

 HCatRecord getInteger method returns String when used on Partition columns of 
 type INT
 --

 Key: HIVE-5545
 URL: https://issues.apache.org/jira/browse/HIVE-5545
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.11.0
 Environment: hadoop-1.0.3
Reporter: Rishav Rohit

 HCatRecord getInteger method returns String when used on Partition columns of 
 type INT.
 java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.lang.Integer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)

2014-09-09 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127225#comment-14127225
 ] 

Ashutosh Chauhan commented on HIVE-7405:


Do we really need AggregreateMapReduceUsage enum? Seems like GroupbyDesc.Mode 
can be used instead as follows:
AggregreateMapReduceUsage.MAP - Mode.Hash
AggregreateMapReduceUsage.REDUCE - Mode.MergePartial
AggregreateMapReduceUsage.MAP_REDUCE - Mode.all_other

If possible, we should reuse GroupbyDesc.Mode, otherwise these modes can be 
mixed and matched and will lead to explosion of combinations.

 Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
 --

 Key: HIVE-7405
 URL: https://issues.apache.org/jira/browse/HIVE-7405
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, 
 HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, 
 HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, 
 HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, 
 HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, 
 HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, 
 HIVE-7405.995.patch


 Vectorize the basic case that does not have any count distinct aggregation.
 Add a 4th processing mode in VectorGroupByOperator for reduce where each 
 input VectorizedRowBatch has only values for one key at a time.  Thus, the 
 values in the batch can be aggregated quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5871) Use multiple-characters as field delimiter

2014-09-09 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127242#comment-14127242
 ] 

Lefty Leverenz commented on HIVE-5871:
--

Thanks for that explanation, [~brocknoland].  Live and learn.

[~lirui], since I have edit permission you can tell me what to change and I'll 
do it for you.  That will help avoid confusion for JIRA surfers.

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7689) Enable Postgres as METASTORE back-end


 [ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-7689:
---
Attachment: HIVE-7689.7.patch

Fix error in unit tests

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on postgres 
 metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7989) Optimize Windowing function performance for row frames

2014-09-09 Thread Ankit Kamboj (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127282#comment-14127282
 ] 

Ankit Kamboj commented on HIVE-7989:


Looks like the tests that failed are not due to the patch itself (ptf-windowing 
tests are part of ql module). Could somebody take a quick look and advise?

 Optimize Windowing function performance for row frames
 --

 Key: HIVE-7989
 URL: https://issues.apache.org/jira/browse/HIVE-7989
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Affects Versions: 0.13.0
Reporter: Ankit Kamboj
 Attachments: HIVE-7989.patch


 To find aggregate value for each row, current windowing function 
 implementation creates a new aggregation buffer for each row, iterates over 
 all the rows in respective window frame, puts them in buffer and then finds 
 the aggregated value. This causes bottleneck for partitions with huge number 
 of rows because this process runs in n-square complexity (n being rows in a 
 partition) for each partition. So, if there are multiple partitions in a 
 dataset, each with millions of rows, aggregation for all rows will take days 
 to finish.
 There is scope of optimization for row frames, for following cases:
 a) For UNBOUNDED PRECEDING start and bounded end: Instead of iterating on 
 window frame again for each row, we can slide the end one row at a time and 
 aggregate, since we know the start is fixed for each row. This will have 
 running time linear to the size of partition.
 b) For bounded start and UNBOUNDED FOLLOWING end: Instead of iterating on 
 window frame again for each row, we can slide the start one row at a time and 
 aggregate in reverse, since we know the end is fixed for each row. This will 
 have running time linear to the size of partition.
 Also, In general for both row and value frames, we don't need to iterate over 
 the range and re-create aggregation buffer if the start as well as end remain 
 same. Instead, can re-use the previously created aggregation buffer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25468: HIVE-7777: add CSVSerde support

2014-09-09 Thread Lefty Leverenz



 On Sept. 9, 2014, 8:49 a.m., Lars Francke wrote:
  serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java, line 31
  https://reviews.apache.org/r/25468/diff/1/?file=683467#file683467line31
 
  This comment doesn't add value so I suggest removing it.

Or you could expand the comment.


- Lefty


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25468/#review52688
---


On Sept. 9, 2014, 2:16 a.m., cheng xu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25468/
 ---
 
 (Updated Sept. 9, 2014, 2:16 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-
 https://issues.apache.org/jira/browse/HIVE-
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-: add CSVSerde support
 
 
 Diffs
 -
 
   serde/pom.xml f8bcc830cfb298d739819db8fbaa2f98f221ccf3 
   serde/src/java/org/apache/hadoop/hive/serde2/CSVSerde.java PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/TestCSVSerde.java PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/25468/diff/
 
 
 Testing
 ---
 
 Unit test
 
 
 Thanks,
 
 cheng xu

[jira] [Updated] (HIVE-7936) Support for handling Thrift Union types

2014-09-09 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated HIVE-7936:
---
Attachment: HIVE-7936.patch

 Support for handling Thrift Union types 
 

 Key: HIVE-7936
 URL: https://issues.apache.org/jira/browse/HIVE-7936
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Attachments: HIVE-7936.patch


 Currently hive does not support thrift unions through ThriftDeserializer. 
 Need to add support for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7936) Support for handling Thrift Union types

2014-09-09 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated HIVE-7936:
---
Fix Version/s: 0.14.0
   Status: Patch Available  (was: In Progress)

 Support for handling Thrift Union types 
 

 Key: HIVE-7936
 URL: https://issues.apache.org/jira/browse/HIVE-7936
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7936.patch


 Currently hive does not support thrift unions through ThriftDeserializer. 
 Need to add support for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-09-09 Thread Mithun Radhakrishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127298#comment-14127298
 ] 

Mithun Radhakrishnan commented on HIVE-7100:


[~dbsalti], [~xuefuz], I agree. Another JIRA for dropPartitions().

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, 
 HIVE-7100.4.patch, HIVE-7100.5.patch, HIVE-7100.8.patch, HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7694) SMB join on tables differing by number of sorted by columns with same join prefix fails


[ 
https://issues.apache.org/jira/browse/HIVE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127310#comment-14127310
 ] 

Hive QA commented on HIVE-7694:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667389/HIVE-7694.2.patch

{color:green}SUCCESS:{color} +1 6193 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/710/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/710/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-710/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667389

 SMB join on tables differing by number of sorted by columns with same join 
 prefix fails
 ---

 Key: HIVE-7694
 URL: https://issues.apache.org/jira/browse/HIVE-7694
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad
 Fix For: 0.14.0

 Attachments: HIVE-7694.1.patch, HIVE-7694.2.patch, HIVE-7694.patch


 For eg: If two tables T1 sorted by (a, b, c) clustered by a and T2 sorted by 
 (a) and clustered by (a) are joined, the following exception is seen
 {noformat}
 14/08/11 09:09:38 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
 1, Size: 1
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.checkSortColsAndJoinCols(AbstractSMBJoinProc.java:378)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.isEligibleForBucketSortMergeJoin(AbstractSMBJoinProc.java:352)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.canConvertBucketMapJoinToSMBJoin(AbstractSMBJoinProc.java:119)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapjoinProc.process(SortedMergeBucketMapjoinProc.java:51)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
 at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeBucketMapJoinOptimizer.transform(SortedMergeBucketMapJoinOptimizer.java:109)
 at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9305)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at 
 org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:393)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.


[ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127313#comment-14127313
 ] 

Xuefu Zhang commented on HIVE-7100:
---

[~dbsalti] Could you please update RB with your latest patch?

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, 
 HIVE-7100.4.patch, HIVE-7100.5.patch, HIVE-7100.8.patch, HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 23352: Support non-constant expressions for MAP type indices.

2014-09-09 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23352/#review52751
---



ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
https://reviews.apache.org/r/23352/#comment91752

I think you can use FunctionRegistry.implicitConvertable() here rather than 
having to create a new method.



ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
https://reviews.apache.org/r/23352/#comment91809

could we also use implicitConvertable() here?


- Jason Dere


On July 9, 2014, 6:57 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/23352/
 ---
 
 (Updated July 9, 2014, 6:57 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7325
 https://issues.apache.org/jira/browse/HIVE-7325
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Here is my sample:
 {code}
 CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) 
 TBLPROPERTIES (hbase.table.name = RECORD); 
 
 
 CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) 
 TBLPROPERTIES (hbase.table.name = KEY_RECORD); 
 {code}
 The following join statement doesn't work. 
 {code}
 SELECT a.*, b.* from KEY_RECORD a join RECORD b 
 WHERE a.RecordId[b.RecordID] is not null;
 {code}
 FAILED: SemanticException 2:16 Non-constant expression for map indexes not 
 supported. Error encountered near token 'RecordID' 
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 9889cfe 
   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
 e44f5ae 
   ql/src/test/queries/clientpositive/array_map_access_nonconstant.q 
 PRE-CREATION 
   ql/src/test/queries/negative/invalid_list_index.q c40f079 
   ql/src/test/queries/negative/invalid_list_index2.q 99d0b3d 
   ql/src/test/queries/negative/invalid_map_index2.q 5828f07 
   ql/src/test/results/clientpositive/array_map_access_nonconstant.q.out 
 PRE-CREATION 
   ql/src/test/results/compiler/errors/invalid_list_index.q.out a4179cd 
   ql/src/test/results/compiler/errors/invalid_list_index2.q.out aaa9455 
   ql/src/test/results/compiler/errors/invalid_map_index2.q.out edc9bda 
   
 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java
  5ccacf1 
 
 Diff: https://reviews.apache.org/r/23352/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Navis Ryu

[jira] [Commented] (HIVE-7325) Support non-constant expressions for MAP type indices.

2014-09-09 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127330#comment-14127330
 ] 

Jason Dere commented on HIVE-7325:
--

Couple of comments on RB

 Support non-constant expressions for MAP type indices.
 --

 Key: HIVE-7325
 URL: https://issues.apache.org/jira/browse/HIVE-7325
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Mala Chikka Kempanna
Assignee: Navis
 Fix For: 0.14.0

 Attachments: HIVE-7325.1.patch.txt, HIVE-7325.2.patch.txt


 Here is my sample:
 {code}
 CREATE TABLE RECORD(RecordID string, BatchDate string, Country string) 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,D:BatchDate,D:Country) 
 TBLPROPERTIES (hbase.table.name = RECORD); 
 CREATE TABLE KEY_RECORD(KeyValue String, RecordId mapstring,string) 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key, K:) 
 TBLPROPERTIES (hbase.table.name = KEY_RECORD); 
 {code}
 The following join statement doesn't work. 
 {code}
 SELECT a.*, b.* from KEY_RECORD a join RECORD b 
 WHERE a.RecordId[b.RecordID] is not null;
 {code}
 FAILED: SemanticException 2:16 Non-constant expression for map indexes not 
 supported. Error encountered near token 'RecordID' 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7818) Support boolean PPD for ORC

2014-09-09 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127361#comment-14127361
 ] 

Prasanth J commented on HIVE-7818:
--

[~daijy] I will take a look at it sometime today and will post review. 
Meanwhile, can you please post the patch in review board?

 Support boolean PPD for ORC
 ---

 Key: HIVE-7818
 URL: https://issues.apache.org/jira/browse/HIVE-7818
 Project: Hive
  Issue Type: Improvement
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: HIVE-7818.1.patch


 Currently ORC does collect stats for boolean field. However, the boolean 
 stats is not range based, instead, it collects counts of true records. 
 RecordReaderImpl.evaluatePredicate currently only deals with range based 
 stats, we need to improve it to deal with the boolean stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25178: Add DROP TABLE PURGE

2014-09-09 Thread david seraf


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25178/
---

(Updated Sept. 9, 2014, 6:51 p.m.)


Review request for hive and Xuefu Zhang.


Changes
---

latest patch from HIVE-7100 includes documentation updates and responses to RB 
comments


Repository: hive-git


Description
---

Add PURGE option to DROP TABLE command to skip saving table data to the trash


Diffs (updated)
-

  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitionPublish.java
 be7134f 
  
hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/tool/TestTempletonUtils.java
 af952f2 
  
itests/hive-unit/src/test/java/org/apache/hive/jdbc/miniHS2/TestHiveServer2.java
 da51a55 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
9489949 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
a94a7a3 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreFsImpl.java 
cff0718 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
cbdba30 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreFS.java a141793 
  metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java 613b709 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java cd017d8 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e387b8f 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 
4cf98d8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java f31a409 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 32db0c7 
  ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f 
  ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java 406aae9 
  ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveRemote.java 1a5ba87 
  ql/src/test/queries/clientpositive/drop_table_purge.q PRE-CREATION 
  ql/src/test/results/clientpositive/drop_table_purge.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/25178/diff/


Testing
---

added code test and added QL test.  Tests passed in CI, but other, unrelated 
tests failed.


Thanks,

david seraf

[jira] [Commented] (HIVE-7100) Users of hive should be able to specify skipTrash when dropping tables.

2014-09-09 Thread david serafini (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127387#comment-14127387
 ] 

david serafini commented on HIVE-7100:
--

done.  HIVE-7100.8.patch uploaded to RB.

 Users of hive should be able to specify skipTrash when dropping tables.
 ---

 Key: HIVE-7100
 URL: https://issues.apache.org/jira/browse/HIVE-7100
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.13.0
Reporter: Ravi Prakash
Assignee: Jayesh
 Attachments: HIVE-7100.1.patch, HIVE-7100.2.patch, HIVE-7100.3.patch, 
 HIVE-7100.4.patch, HIVE-7100.5.patch, HIVE-7100.8.patch, HIVE-7100.patch


 Users of our clusters are often running up against their quota limits because 
 of Hive tables. When they drop tables, they have to then manually delete the 
 files from HDFS using skipTrash. This is cumbersome and unnecessary. We 
 should enable users to skipTrash directly when dropping tables.
 We should also be able to provide this functionality without polluting SQL 
 syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

[
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127407#comment-14127407
]

Josh Elser commented on HIVE-7950:
--

Ok, I figured a bit more out here. I believe that the AM *is* correctly getting
the extra jars from the storage handler as expected. The subsequent errors are
coming from the containers that are started to actually run the DAG (rather
than the coordination from the tez AM).

The interesting part is that the patch (HIVE-7950-1.diff) which starts a brand
new Session will result in a successful query. It seems like maybe Tez isn't
passing along the extra resources we added to the running session (AM) in Hive
along to the DAG containers to actually run the query. I have no idea at this
point if this is a problem in how hive is using tez or if it's a bug in tez
itself...

StorageHandler resources aren't added to Tez Session if already Session is
already Open
---

Key: HIVE-7950
URL: https://issues.apache.org/jira/browse/HIVE-7950
Project: Hive
Issue Type: Bug
Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 0.14.0

Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff

Was trying to run some queries using the AccumuloStorageHandler when using
the Tez execution engine. Some things that classes which were added to
tmpjars weren't making it into the container. When a Tez Session is already
open, as is the normal case when simply using the `hive` command, the
resources aren't added.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127417#comment-14127417
 ] 

Gopal V commented on HIVE-7950:
---

Can you post the sequence of things you are doing?

As in, how is the JARs getting added - is it an explicit ADD JAR?

Tez-0.5.0-rc1 had an issue we tackled with an API change to ship JARs 
differently between the AM and tasks.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

[
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127440#comment-14127440
]

Josh Elser commented on HIVE-7950:
--

Sure thing, [~gopalv].

I don't actually have to do any extra {{ADD JAR}} commands. The
AccumuloStorageHandler constructs a list of jars that need to be passed along
to the execution engine (via tmpjars in the Hadoop configuration). With the
'yarn' execution.engine, this works just fine -- the resources are localized
and added to the Map/Reduce containers and things are great.

When I try to run with 'tez', there are a few issues. The first is that, if
there is already a TezSessionState that was already open'ed (e.g. like what is
done when I just open the hive shell), it will have been started without those
extra 'tmpjars' resources from the StorageHandler and the query will fail
because we need those jars.

Sergey mentioned that Tez 0.5.0 had a new method that would allow more
resources to be added to an already started TezClient
({{TezClient#addAppMasterLocalFiles(MapString, LocalResource)}}).
Implementing this (in the hive-7950-tez-WIP.diff attachment), appears to have
successfully added the extra jars from the StorageHandler to the DAGAppMaster,
but the containers started to actually run the query are missing those extra
jars.

Does that make sense?

StorageHandler resources aren't added to Tez Session if already Session is
already Open
---

Key: HIVE-7950
URL: https://issues.apache.org/jira/browse/HIVE-7950
Project: Hive
Issue Type: Bug
Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
Fix For: 0.14.0

Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-6550) SemanticAnalyzer.reset() doesn't clear all the state

2014-09-09 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6550:
---
Status: Patch Available  (was: Open)

forgot to submit patch

 SemanticAnalyzer.reset() doesn't clear all the state
 

 Key: HIVE-6550
 URL: https://issues.apache.org/jira/browse/HIVE-6550
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1, 0.13.0, 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Sergey Shelukhin
 Attachments: HIVE-6550.01.patch, HIVE-6550.02.patch, 
 HIVE-6550.03.patch, HIVE-6550.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7689) Enable Postgres as METASTORE back-end


[ 
https://issues.apache.org/jira/browse/HIVE-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127442#comment-14127442
 ] 

Hive QA commented on HIVE-7689:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12667443/HIVE-7689.7.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6192 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.testStatsAfterCompactionPartTbl
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/711/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/711/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-711/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12667443

 Enable Postgres as METASTORE back-end
 -

 Key: HIVE-7689
 URL: https://issues.apache.org/jira/browse/HIVE-7689
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Damien Carol
Assignee: Damien Carol
Priority: Minor
  Labels: metastore, postgres
 Fix For: 0.14.0

 Attachments: HIVE-7689.5.patch, HIVE-7689.6.patch, HIVE-7689.7.patch, 
 HIVE-7889.1.patch, HIVE-7889.2.patch, HIVE-7889.3.patch, HIVE-7889.4.patch


 I maintain few patches to make Metastore works with Postgres back end in our 
 production environment.
 The main goal of this JIRA is to push upstream these patches.
 This patch enable LOCKS, COMPACTION and fix error in STATS on postgres 
 metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127447#comment-14127447
 ] 

Sergey Shelukhin commented on HIVE-7946:


stats_noscan_1 and about 10-15 more tests might be fixed by HIVE-6550

 CBO: Merge CBO changes to Trunk
 ---

 Key: HIVE-7946
 URL: https://issues.apache.org/jira/browse/HIVE-7946
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
 HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-7946) CBO: Merge CBO changes to Trunk

2014-09-09 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127447#comment-14127447
 ] 

Sergey Shelukhin edited comment on HIVE-7946 at 9/9/14 7:36 PM:


stats_noscan_1 and about 10-15 more tests might be fixed by HIVE-6550 (when 
that is in)


was (Author: sershe):
stats_noscan_1 and about 10-15 more tests might be fixed by HIVE-6550

 CBO: Merge CBO changes to Trunk
 ---

 Key: HIVE-7946
 URL: https://issues.apache.org/jira/browse/HIVE-7946
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-7946.1.patch, HIVE-7946.2.patch, HIVE-7946.3.patch, 
 HIVE-7946.4.patch, HIVE-7946.5.patch, HIVE-7946.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6866) Hive server2 jdbc driver connection leak with namenode

2014-09-09 Thread Ankita Bakshi (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127451#comment-14127451
 ] 

Ankita Bakshi commented on HIVE-6866:
-

We are facing same issue in production. We are using CDH4.4 with Apache Hive 
0.12. Is there a workaround for this issue other than restarting hiveserver2?

 Hive server2 jdbc driver connection leak with namenode
 --

 Key: HIVE-6866
 URL: https://issues.apache.org/jira/browse/HIVE-6866
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Shengjun Xin

 1. Set 'ipc.client.connection.maxidletime' to 360 in core-site.xml and 
 start hive-server2.
 2. Connect hive server2 repetitively in a while true loop.
 3. The tcp connection number will increase until out of memory, it seems that 
 hive server2 will not close the connection until the time out, the error 
 message is as the following:
 {code}
 2014-03-18 23:30:36,873 ERROR ql.Driver (SessionState.java:printError(386)) - 
 FAILED: RuntimeException java.io.IOException: Failed on local exception: 
 java.io.IOException: Couldn't set up IO streams; Host Details : local host 
 is: hdm1.hadoop.local/192.168.2.101; destination host is: 
 hdm1.hadoop.local:8020;
 java.lang.RuntimeException: java.io.IOException: Failed on local exception: 
 java.io.IOException: Couldn't set up IO streams; Host Details : local host 
 is: hdm1.hadoop.local/192.168.2.101; destination host is: 
 hdm1.hadoop.local:8020;
   at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:190)
   at org.apache.hadoop.hive.ql.Context.getMRScratchDir(Context.java:231)
   at org.apache.hadoop.hive.ql.Context.getMRTmpFileURI(Context.java:288)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1274)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8676)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
   at 
 org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:95)
   at 
 org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:181)
   at 
 org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:148)
   at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:203)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
   at 
 org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:40)
   at 
 org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:37)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:524)
   at 
 org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:37)
   at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.IOException: Failed on local exception: 
 java.io.IOException: Couldn't set up IO streams; Host Details : local host 
 is: hdm1.hadoop.local/192.168.2.101; destination host is: 
 hdm1.hadoop.local:8020;
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
   at org.apache.hadoop.ipc.Client.call(Client.java:1239)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
   at com.sun.proxy.$Proxy11.mkdirs(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at

[jira] [Created] (HIVE-8034) Don't add colon when no port is specified

Brock Noland created HIVE-8034:
--

 Summary: Don't add colon when no port is specified
 Key: HIVE-8034
 URL: https://issues.apache.org/jira/browse/HIVE-8034
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8034) Don't add colon when no port is specified


 [ 
https://issues.apache.org/jira/browse/HIVE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8034:
---
Description: In HIVE-4910 we added a {{:}} even if there was no port due to 
HADOOP-9776. Now that this is fixed I think we should fix ours as well.

 Don't add colon when no port is specified
 -

 Key: HIVE-8034
 URL: https://issues.apache.org/jira/browse/HIVE-8034
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland

 In HIVE-4910 we added a {{:}} even if there was no port due to HADOOP-9776. 
 Now that this is fixed I think we should fix ours as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8034) Don't add colon when no port is specified


 [ 
https://issues.apache.org/jira/browse/HIVE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8034:
---
Assignee: Brock Noland
  Status: Patch Available  (was: Open)

 Don't add colon when no port is specified
 -

 Key: HIVE-8034
 URL: https://issues.apache.org/jira/browse/HIVE-8034
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-8034.1.patch


 In HIVE-4910 we added a {{:}} even if there was no port due to HADOOP-9776. 
 Now that this is fixed I think we should fix ours as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8034) Don't add colon when no port is specified


 [ 
https://issues.apache.org/jira/browse/HIVE-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-8034:
---
Attachment: HIVE-8034.1.patch

 Don't add colon when no port is specified
 -

 Key: HIVE-8034
 URL: https://issues.apache.org/jira/browse/HIVE-8034
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
 Attachments: HIVE-8034.1.patch


 In HIVE-4910 we added a {{:}} even if there was no port due to HADOOP-9776. 
 Now that this is fixed I think we should fix ours as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)

2014-09-09 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7405:
---
Status: In Progress  (was: Patch Available)

 Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
 --

 Key: HIVE-7405
 URL: https://issues.apache.org/jira/browse/HIVE-7405
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, 
 HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, 
 HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, 
 HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, 
 HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, 
 HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, 
 HIVE-7405.995.patch


 Vectorize the basic case that does not have any count distinct aggregation.
 Add a 4th processing mode in VectorGroupByOperator for reduce where each 
 input VectorizedRowBatch has only values for one key at a time.  Thus, the 
 values in the batch can be aggregated quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)

2014-09-09 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7405:
---
Attachment: HIVE-7405.996.patch

 Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
 --

 Key: HIVE-7405
 URL: https://issues.apache.org/jira/browse/HIVE-7405
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, 
 HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, 
 HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, 
 HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, 
 HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, 
 HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, 
 HIVE-7405.995.patch, HIVE-7405.996.patch


 Vectorize the basic case that does not have any count distinct aggregation.
 Add a 4th processing mode in VectorGroupByOperator for reduce where each 
 input VectorizedRowBatch has only values for one key at a time.  Thus, the 
 values in the batch can be aggregated quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)

2014-09-09 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-7405:
---
Status: Patch Available  (was: In Progress)

 Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
 --

 Key: HIVE-7405
 URL: https://issues.apache.org/jira/browse/HIVE-7405
 Project: Hive
  Issue Type: Sub-task
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
 Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, 
 HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch, 
 HIVE-7405.8.patch, HIVE-7405.9.patch, HIVE-7405.91.patch, HIVE-7405.92.patch, 
 HIVE-7405.93.patch, HIVE-7405.94.patch, HIVE-7405.95.patch, 
 HIVE-7405.96.patch, HIVE-7405.97.patch, HIVE-7405.98.patch, 
 HIVE-7405.99.patch, HIVE-7405.991.patch, HIVE-7405.994.patch, 
 HIVE-7405.995.patch, HIVE-7405.996.patch


 Vectorize the basic case that does not have any count distinct aggregation.
 Add a 4th processing mode in VectorGroupByOperator for reduce where each 
 input VectorizedRowBatch has only values for one key at a time.  Thus, the 
 values in the batch can be aggregated quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-09 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127479#comment-14127479
 ] 

Gopal V commented on HIVE-7950:
---

This is related to TEZ-1469, which should provide guidance.

For reference, take a look at some of my example code

https://github.com/t3rmin4t0r/tez-broadcast-example/blob/master/src/main/java/org/notmysock/tez/BroadcastTest.java#L203

Lines 203 and 195.

 StorageHandler resources aren't added to Tez Session if already Session is 
 already Open
 ---

 Key: HIVE-7950
 URL: https://issues.apache.org/jira/browse/HIVE-7950
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler, Tez
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 0.14.0

 Attachments: HIVE-7950-1.diff, hive-7950-tez-WIP.diff


 Was trying to run some queries using the AccumuloStorageHandler when using 
 the Tez execution engine. Some things that classes which were added to 
 tmpjars weren't making it into the container. When a Tez Session is already 
 open, as is the normal case when simply using the `hive` command, the 
 resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open