[jira] [Commented] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681255#comment-14681255
 ] 

Hive QA commented on HIVE-11467:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749692/HIVE-11467.04.patch

{color:green}SUCCESS:{color} +1 9348 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4913/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4913/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4913/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749692 - PreCommit-HIVE-TRUNK-Build

 WriteBuffers rounding wbSize to next power of 2 may cause OOM
 -

 Key: HIVE-11467
 URL: https://issues.apache.org/jira/browse/HIVE-11467
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
 HIVE-11467.03.patch, HIVE-11467.04.patch


 If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
 rounding first to the next power of 2
 {code}
   public WriteBuffers(int wbSize, long maxSize) {
 this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
 (Integer.highestOneBit(wbSize)  1);
 this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
 this.offsetMask = this.wbSize - 1;
 this.maxSize = maxSize;
 writePos.bufferIndex = -1;
 nextBufferToWrite();
   }
 {code}
 That may break existing memory consumption assumption for mapjoin, and 
 potentially cause OOM.
 The solution will be to pass a power of 2 number as wbSize from upstream 
 during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11442) Remove commons-configuration.jar from Hive distribution

2015-08-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-11442:
--
Attachment: HIVE-11442.3.patch

commons-configuration.jar is needed in testing. Should only be removed in 
packaging.

 Remove commons-configuration.jar from Hive distribution
 ---

 Key: HIVE-11442
 URL: https://issues.apache.org/jira/browse/HIVE-11442
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11442.1.patch, HIVE-11442.2.patch, 
 HIVE-11442.3.patch


 Some customer report version conflicting for Hive bundled 
 commons-configuration.jar. Actually commons-configuration.jar is not needed 
 by Hive. It is a transitive dependency of Hadoop/Accumulo. User should be 
 able to pick those jars from Hadoop at runtime. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-08-10 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-11376:
--
Labels: TODOC2.0  (was: )

 CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
 found for one of the input files
 -

 Key: HIVE-11376
 URL: https://issues.apache.org/jira/browse/HIVE-11376
 Project: Hive
  Issue Type: Bug
Reporter: Rajat Khandelwal
Assignee: Rajat Khandelwal
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-11376.02.patch


 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
 This is the exact code snippet:
 {noformat}
 / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
 the tree or not,
   // we use a configuration variable for the same
   if (this.mrwork != null  !this.mrwork.getHadoopSupportsSplittable()) {
 // The following code should be removed, once
 // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
 // Hadoop does not handle non-splittable files correctly for 
 CombineFileInputFormat,
 // so don't use CombineFileInputFormat for non-splittable files
 //ie, dont't combine if inputformat is a TextInputFormat and has 
 compression turned on
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11376) CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are found for one of the input files

2015-08-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681285#comment-14681285
 ] 

Lefty Leverenz commented on HIVE-11376:
---

Doc note:  This removes *hive.hadoop.supports.splittable.combineinputformat* 
from HiveConf.java, so the wikidoc needs a bullet Removed In: 2.0.0 with 
HIVE-11376 for the parameter.

* [Configuration Properties -- 
hive.hadoop.supports.splittable.combineinputformat | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.hadoop.supports.splittable.combineinputformat]

 CombineHiveInputFormat is falling back to HiveInputFormat in case codecs are 
 found for one of the input files
 -

 Key: HIVE-11376
 URL: https://issues.apache.org/jira/browse/HIVE-11376
 Project: Hive
  Issue Type: Bug
Reporter: Rajat Khandelwal
Assignee: Rajat Khandelwal
  Labels: TODOC2.0
 Fix For: 2.0.0

 Attachments: HIVE-11376.02.patch


 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java#L379
 This is the exact code snippet:
 {noformat}
 / Since there is no easy way of knowing whether MAPREDUCE-1597 is present in 
 the tree or not,
   // we use a configuration variable for the same
   if (this.mrwork != null  !this.mrwork.getHadoopSupportsSplittable()) {
 // The following code should be removed, once
 // https://issues.apache.org/jira/browse/MAPREDUCE-1597 is fixed.
 // Hadoop does not handle non-splittable files correctly for 
 CombineFileInputFormat,
 // so don't use CombineFileInputFormat for non-splittable files
 //ie, dont't combine if inputformat is a TextInputFormat and has 
 compression turned on
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11103) Add banker's rounding BROUND UDF

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680179#comment-14680179
 ] 

Hive QA commented on HIVE-11103:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749540/HIVE-11103.4.patch

{color:green}SUCCESS:{color} +1 9354 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4902/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4902/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4902/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749540 - PreCommit-HIVE-TRUNK-Build

 Add banker's rounding BROUND UDF
 

 Key: HIVE-11103
 URL: https://issues.apache.org/jira/browse/HIVE-11103
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-11103.1.patch, HIVE-11103.1.patch, 
 HIVE-11103.2.patch, HIVE-11103.4.patch


 Banker's rounding: the value is rounded to the nearest even number. Also 
 known as Gaussian rounding, and, in German, mathematische Rundung.
 Example
 {code}
   2 digits2 digits
 UnroundedStandard roundingGaussian rounding
   54.1754  54.18  54.18
  343.2050 343.21 343.20
 +106.2038+106.20+106.20 
 =======
  503.5842 503.59 503.58
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet

2015-08-10 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-11504:

Attachment: HIVE-11504.1.patch

 Predicate pushing down doesn't work for float type for Parquet
 --

 Key: HIVE-11504
 URL: https://issues.apache.org/jira/browse/HIVE-11504
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11504.1.patch, HIVE-11504.patch


 Predicate builder should use PrimitiveTypeName type in parquet side to 
 construct predicate leaf instead of the type provided by PredicateLeaf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6892) Permission inheritance issues

2015-08-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680220#comment-14680220
 ] 

Andrés Cordero commented on HIVE-6892:
--

Can some changes be made to [Permission Inheritance in 
Hive|https://cwiki.apache.org/confluence/display/Hive/Permission+Inheritance+in+Hive]?
I've seen some behavior that doesn't match what the doc claims.
Namely:
* Group isn't inherited when the flag is off, already done by HDFS for new 
directories implies that it shouldn't matter.
* Extended ACLs are not inherited they are cloned, which means that default 
ACLs don't propagate down as default+access (the HDFS way), but default only 
(which means default for directories and nothing for files). Extended Acl's 
are taken from parent in the first paragraph  already implies this, but it's 
still rather ambiguous (especially with below containing the same already done 
by HDFS text).

 Permission inheritance issues
 -

 Key: HIVE-6892
 URL: https://issues.apache.org/jira/browse/HIVE-6892
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho

 *HDFS Background*
 * When a file or directory is created, its owner is the user identity of the 
 client process, and its group is inherited from parent (the BSD rule).  
 Permissions are taken from default umask.  Extended Acl's are taken from 
 parent unless they are set explicitly.
 *Goals*
 To reduce need to set fine-grain file security props after every operation, 
 users may want the following Hive warehouse file/dir to auto-inherit security 
 properties from their directory parents:
 * Directories created by new database/table/partition/bucket
 * Files added to tables via load/insert
 * Table directories exported/imported  (open question of whether exported 
 table inheriting perm from new parent needs another flag)
 What may be inherited:
 * Basic file permission
 * Groups (already done by HDFS for new directories)
 * Extended ACL's (already done by HDFS for new directories)
 *Behavior*
 * When hive.warehouse.subdir.inherit.perms flag is enabled in Hive, Hive 
 will try to do all above inheritances.  In the future, we can add more flags 
 for more finer-grained control.
 * Failure by Hive to inherit will not cause operation to fail.  Rule of thumb 
 of when security-prop inheritance will happen is the following:
 ** To run chmod, a user must be the owner of the file, or else a super-user.
 ** To run chgrp, a user must be the owner of files, or else a super-user.
 ** Hence, user that hive runs as (either 'hive' or the logged-in user in case 
 of impersonation), must be super-user or owner of the file whose security 
 properties are going to be changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680293#comment-14680293
 ] 

Hive QA commented on HIVE-11398:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749542/HIVE-11398.4.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9347 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer3
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4903/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4903/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4903/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749542 - PreCommit-HIVE-TRUNK-Build

 Parse wide OR and wide AND trees to flat OR/AND trees
 -

 Key: HIVE-11398
 URL: https://issues.apache.org/jira/browse/HIVE-11398
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer, UDF
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11398.2.patch, HIVE-11398.3.patch, 
 HIVE-11398.4.patch, HIVE-11398.patch


 Deep trees of AND/OR are hard to traverse particularly when they are merely 
 the same structure in nested form as a version of the operator that takes an 
 arbitrary number of args.
 One potential way to convert the DFS searches into a simpler BFS search is to 
 introduce a new Operator pair named ALL and ANY.
 ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A)
 ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A)
 The SemanticAnalyser would be responsible for generating these operators and 
 this would mean that the depth and complexity of traversals for the simplest 
 case of wide AND/OR trees would be trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()

2015-08-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8285:
-
Description: 
{code}
  if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
   eC.getValue() == Boolean.TRUE) {
{code}

equals() should be used in the above comparison.

  was:
{code}
  if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
   eC.getValue() == Boolean.TRUE) {
{code}
equals() should be used in the above comparison.


 Reference equality is used on boolean values in 
 PartitionPruner#removeTruePredciates()
 --

 Key: HIVE-8285
 URL: https://issues.apache.org/jira/browse/HIVE-8285
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-8285.patch


 {code}
   if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
eC.getValue() == Boolean.TRUE) {
 {code}
 equals() should be used in the above comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11506) Casting varchar/char type to string cannot be vectorized

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680425#comment-14680425
 ] 

Hive QA commented on HIVE-11506:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749551/HIVE-11506.1.patch.txt

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9347 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_char_mapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_varchar_mapjoin1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_mapjoin1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_varchar_mapjoin1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4904/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4904/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4904/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749551 - PreCommit-HIVE-TRUNK-Build

 Casting varchar/char type to string cannot be vectorized
 

 Key: HIVE-11506
 URL: https://issues.apache.org/jira/browse/HIVE-11506
 Project: Hive
  Issue Type: Improvement
  Components: Vectorization
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-11506.1.patch.txt


 It's not defined in vectorization context.
 {code}
 explain 
 select cast(cast(cstring1 as varchar(10)) as string) x from alltypesorc order 
 by x;
 {code}
 Mapper is not vectorized by exception,
 {noformat}
 015-08-10 17:02:08,003 INFO  [main]: physical.Vectorizer 
 (Vectorizer.java:validateExprNodeDesc(1299)) - Failed to vectorize
 org.apache.hadoop.hive.ql.metadata.HiveException: Unhandled cast input type: 
 varchar(10)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getCastToString(VectorizationContext.java:1543)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUDFBridgeVectorExpression(VectorizationContext.java:1379)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1177)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440)
 at 
 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1293)
 at 
 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1284)
 at 
 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateSelectOperator(Vectorizer.java:1116)
 at 
 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateMapWorkOperator(Vectorizer.java:906)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees

2015-08-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680536#comment-14680536
 ] 

Gopal V commented on HIVE-11398:


Patch LGTM - +1

The last test-failure seems to be an expected OR rotation due to the traversal 
order.

groupby_multi_single_reducer3.q.out

{code}
  HEAD
 predicate: key + key) = 400) or (((key - 100) = 500) 
 and value is not null)) or key + key) = 200) or ((key - 100) = 100)) or 
 ((key = 300) and value is not null))) (type: boolean)
 ===
 predicate: key + key) = 200) or ((key - 100) = 100) 
 or ((key = 300) and value is not null)) or (((key + key) = 400) or (((key - 
 100) = 500) and value is not null))) (type: boolean)
{code}

 Parse wide OR and wide AND trees to flat OR/AND trees
 -

 Key: HIVE-11398
 URL: https://issues.apache.org/jira/browse/HIVE-11398
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer, UDF
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11398.2.patch, HIVE-11398.3.patch, 
 HIVE-11398.4.patch, HIVE-11398.5.patch, HIVE-11398.patch


 Deep trees of AND/OR are hard to traverse particularly when they are merely 
 the same structure in nested form as a version of the operator that takes an 
 arbitrary number of args.
 One potential way to convert the DFS searches into a simpler BFS search is to 
 introduce a new Operator pair named ALL and ANY.
 ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A)
 ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A)
 The SemanticAnalyser would be responsible for generating these operators and 
 this would mean that the depth and complexity of traversals for the simplest 
 case of wide AND/OR trees would be trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11466) HIVE-10166 generates more data on hive.log causing Jenkins to fill all the disk.

2015-08-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680562#comment-14680562
 ] 

Thejas M Nair commented on HIVE-11466:
--

[~prasanth_j] [~spena] [~xuefuz] [~jdere] [~csun]
Thanks for the great team work  ! 

 HIVE-10166 generates more data on hive.log causing Jenkins to fill all the 
 disk.
 

 Key: HIVE-11466
 URL: https://issues.apache.org/jira/browse/HIVE-11466
 Project: Hive
  Issue Type: Bug
Reporter: Sergio Peña
Assignee: Xuefu Zhang
 Fix For: spark-branch, 2.0.0

 Attachments: HIVE-11466.1.patch, HIVE-11466.patch


 An issue with HIVE-10166 patch is increasing the size of hive.log and  
 causing jenkins to fail because it does not have more space.
 Here's a test I run when running TestJdbcWithMiniHS2 before the patch, with 
 the patch, and after other commits.
 {noformat}
 BEFORE HIVE-10166
 13M Aug  5 11:57 ./hive-unit/target/tmp/log/hive.log
 WITH HIVE-10166
 2.4G Aug  5 12:07 ./hive-unit/target/tmp/log/hive.log
 CURRENT HEAD
 3.2G Aug  5 12:36 ./hive-unit/target/tmp/log/hive.log
 {noformat}
 This is just a single test, but on Jenkins, hive.log is more than 13G of size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-08-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11387:
---
Attachment: HIVE-11387.07.patch

resubmit the patch as all the test failures can pass on my mac.

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
 reduce_deduplicate optimization
 --

 Key: HIVE-11387
 URL: https://issues.apache.org/jira/browse/HIVE-11387
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
 HIVE-11387.03.patch, HIVE-11387.04.patch, HIVE-11387.05.patch, 
 HIVE-11387.06.patch, HIVE-11387.07.patch


 The main problem is that, due to return path, now we may have 
 {{(RS1-GBY2)\-(RS3-GBY4)}} when map.aggr=false, i.e., no map aggr. However, 
 in the non-return path, it will be treated as {{(RS1)-(GBY2-RS3-GBY4)}}. The 
 main problem is that it does not take into account of the setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde

2015-08-10 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680395#comment-14680395
 ] 

Anthony Hsu commented on HIVE-4734:
---

Any updates on this patch? I'd love to see this committed, too! :-)

 Use custom ObjectInspectors for AvroSerde
 -

 Key: HIVE-4734
 URL: https://issues.apache.org/jira/browse/HIVE-4734
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mark Wagner
  Labels: Avro, AvroSerde, Performance
 Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch, 
 HIVE-4734.4.patch, HIVE-4734.5.patch


 Currently, the AvroSerde recursively copies all fields of a record from the 
 GenericRecord to a List row object and provides the standard 
 ObjectInspectors. Performance can be improved by providing ObjectInspectors 
 to the Avro record itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()

2015-08-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8458:
-
Description: 
{code}
Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);

// if the plan path hasn't been initialized just return, nothing to clean.
if (mapPath == null  reducePath == null) {
  return;
}

try {
  FileSystem fs = mapPath.getFileSystem(conf);
{code}

If mapPath is null but reducePath is not null, getFileSystem() call would 
produce NPE

  was:
{code}
Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);

// if the plan path hasn't been initialized just return, nothing to clean.
if (mapPath == null  reducePath == null) {
  return;
}

try {
  FileSystem fs = mapPath.getFileSystem(conf);
{code}
If mapPath is null but reducePath is not null, getFileSystem() call would 
produce NPE


 Potential null dereference in Utilities#clearWork()
 ---

 Key: HIVE-8458
 URL: https://issues.apache.org/jira/browse/HIVE-8458
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Ted Yu
Assignee: skrho
Priority: Minor
 Attachments: HIVE-8458_001.patch


 {code}
 Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
 Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);
 // if the plan path hasn't been initialized just return, nothing to clean.
 if (mapPath == null  reducePath == null) {
   return;
 }
 try {
   FileSystem fs = mapPath.getFileSystem(conf);
 {code}
 If mapPath is null but reducePath is not null, getFileSystem() call would 
 produce NPE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner

2015-08-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8343:
-
Description: 
In addEvent() and processVertex(), there is call such as the following:
{code}
  queue.offer(event);
{code}
The return value should be checked. If false is returned, event would not have 
been queued.
Take a look at line 328 in:
http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html

  was:
In addEvent() and processVertex(), there is call such as the following:
{code}
  queue.offer(event);
{code}

The return value should be checked. If false is returned, event would not have 
been queued.
Take a look at line 328 in:
http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html


 Return value from BlockingQueue.offer() is not checked in 
 DynamicPartitionPruner
 

 Key: HIVE-8343
 URL: https://issues.apache.org/jira/browse/HIVE-8343
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: JongWon Park
Priority: Minor
 Attachments: HIVE-8343.patch


 In addEvent() and processVertex(), there is call such as the following:
 {code}
   queue.offer(event);
 {code}
 The return value should be checked. If false is returned, event would not 
 have been queued.
 Take a look at line 328 in:
 http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11465) CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix stringToMap

2015-08-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-11465.

Resolution: Fixed

resolved by HIVE-11436

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): fix stringToMap
 -

 Key: HIVE-11465
 URL: https://issues.apache.org/jira/browse/HIVE-11465
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong

 Right now str_to_map('a=1 b=2 c=3', ' ', '=') will generate a=null, 
 b=null, 2=null, etc, rather than a=1, b=2, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees

2015-08-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-11398:
---
Attachment: HIVE-11398.5.patch

 Parse wide OR and wide AND trees to flat OR/AND trees
 -

 Key: HIVE-11398
 URL: https://issues.apache.org/jira/browse/HIVE-11398
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer, UDF
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11398.2.patch, HIVE-11398.3.patch, 
 HIVE-11398.4.patch, HIVE-11398.5.patch, HIVE-11398.patch


 Deep trees of AND/OR are hard to traverse particularly when they are merely 
 the same structure in nested form as a version of the operator that takes an 
 arbitrary number of args.
 One potential way to convert the DFS searches into a simpler BFS search is to 
 introduce a new Operator pair named ALL and ANY.
 ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A)
 ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A)
 The SemanticAnalyser would be responsible for generating these operators and 
 this would mean that the depth and complexity of traversals for the simplest 
 case of wide AND/OR trees would be trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11480) CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as input to GenericUDAF

2015-08-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11480:
---
Attachment: HIVE-11480.03.patch

re-upload the patch for QA run as all the tests passed on my laptop.

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as 
 input to GenericUDAF 
 ---

 Key: HIVE-11480
 URL: https://issues.apache.org/jira/browse/HIVE-11480
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11480.01.patch, HIVE-11480.02.patch, 
 HIVE-11480.03.patch


 Some of the UDAF can not deal with char/varchar correctly when return path is 
 on, for example udaf_number_format.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9073) NPE when using custom windowing UDAFs

2015-08-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9073:
-
Affects Version/s: 0.14.0
   1.0.0

 NPE when using custom windowing UDAFs
 -

 Key: HIVE-9073
 URL: https://issues.apache.org/jira/browse/HIVE-9073
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0, 1.0.0
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 1.2.0

 Attachments: HIVE-9073.1.patch, HIVE-9073.2.patch, HIVE-9073.2.patch, 
 HIVE-9073.3.patch


 From the hive-user email group:
 {noformat}
 While executing a simple select query using a custom windowing UDAF I created 
 I am constantly running into this error.
  
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 9 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
 ... 14 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:647)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.getWindowFunctionInfo(FunctionRegistry.java:1875)
 at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.streamingPossible(WindowingTableFunction.java:150)
 at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.setCanAcceptInputAsStream(WindowingTableFunction.java:221)
 at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.initializeStreaming(WindowingTableFunction.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.initializeStreaming(PTFOperator.java:292)
 at 
 org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:86)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
 ... 14 more
  
 Just wanted to check if any of you have faced this earlier. Also when I try 
 to run the Custom UDAF on another server it works fine. The only difference I 
 can see it that the hive version I am using on my local machine is 0.13.1 
 where it is working and on the other machine it is 0.13.0 where I see the 
 above mentioned error. I am not sure if this was a bug which was fixed in the 
 later release but I just wanted to confirm the same.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11477) CBO inserts a UDF cast for integer type promotion (only for negative numbers)

2015-08-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680472#comment-14680472
 ] 

Pengcheng Xiong commented on HIVE-11477:


May need more work. For example, in input_part6.q, we have {code}SELECT x.* 
FROM SRCPART x WHERE x.ds = 2008-04-08 LIMIT 10{code} and in 
union_remove_6_subq, we have {code} explain
select avg(c) from(
  SELECT count(1)-200 as c from src
  UNION ALL
  SELECT count(1) as c from src
)subq {code}

 CBO inserts a UDF cast for integer type promotion (only for negative numbers)
 -

 Key: HIVE-11477
 URL: https://issues.apache.org/jira/browse/HIVE-11477
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-11477.01.patch, HIVE-11477.02.patch


 When CBO is enabled, filters which compares tinyint, smallint columns with 
 constant integer types will insert a UDFToInteger cast for the columns. When 
 CBO is disabled, there is no such UDF. This behaviour breaks ORC predicate 
 pushdown feature as ORC ignores UDFs in the filters.
 In the following examples column t is tinyint
 {code:title=Explain for select count(*) from orc_ppd where t  -127; (CBO 
 OFF)}
 Filter Operator [FIL_9]
predicate:(t = 125) (type: boolean)
Statistics:Num rows: 1050 Data size: 611757 Basic 
 stats: COMPLETE Column stats: NONE
TableScan [TS_0]
   alias:orc_ppd
   Statistics:Num rows: 2100 Data size: 1223514 
 Basic stats: COMPLETE Column stats: NONE
 {code}
 {code:title=Explain for select count(*) from orc_ppd where t  -127; (CBO ON)}
 Filter Operator [FIL_10]
predicate:(UDFToInteger(t)  -127) (type: boolean)
Statistics:Num rows: 700 Data size: 407838 Basic 
 stats: COMPLETE Column stats: NONE
TableScan [TS_0]
   alias:orc_ppd
   Statistics:Num rows: 2100 Data size: 1223514 
 Basic stats: COMPLETE Column stats: NONE
 {code}
 CBO does not insert such cast for non-negative numbers
 {code:title=Explain for select count(*) from orc_ppd where t  127; (CBO ON)}
 Filter Operator [FIL_10]
predicate:(t  127) (type: boolean)
Statistics:Num rows: 700 Data size: 407838 Basic 
 stats: COMPLETE Column stats: NONE
TableScan [TS_0]
   alias:orc_ppd
   Statistics:Num rows: 2100 Data size: 1223514 
 Basic stats: COMPLETE Column stats: NONE
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()

2015-08-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8282:
-
Description: 
In convertJoinMapJoin():
{code}
for (Operator? extends OperatorDesc parentOp : 
joinOp.getParentOperators()) {
  if (parentOp instanceof MuxOperator) {
return null;
  }
}
{code}
NPE would result if convertJoinMapJoin() returns null:

{code}
MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, 
bigTablePosition);
MapJoinDesc joinDesc = mapJoinOp.getConf();
{code}


  was:
In convertJoinMapJoin():
{code}
for (Operator? extends OperatorDesc parentOp : 
joinOp.getParentOperators()) {
  if (parentOp instanceof MuxOperator) {
return null;
  }
}
{code}
NPE would result if convertJoinMapJoin() returns null:
{code}
MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, 
bigTablePosition);
MapJoinDesc joinDesc = mapJoinOp.getConf();
{code}



 Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
 -

 Key: HIVE-8282
 URL: https://issues.apache.org/jira/browse/HIVE-8282
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Ted Yu
Priority: Minor
 Attachments: HIVE-8282.patch


 In convertJoinMapJoin():
 {code}
 for (Operator? extends OperatorDesc parentOp : 
 joinOp.getParentOperators()) {
   if (parentOp instanceof MuxOperator) {
 return null;
   }
 }
 {code}
 NPE would result if convertJoinMapJoin() returns null:
 {code}
 MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, 
 bigTablePosition);
 MapJoinDesc joinDesc = mapJoinOp.getConf();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8342) Potential null dereference in ColumnTruncateMapper#jobClose()

2015-08-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8342:
-
Description: 
{code}
Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null,
  reporter);
{code}

Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is 
dereferenced:
{code}
boolean isCompressed = conf.getCompressed();
TableDesc tableInfo = conf.getTableInfo();
{code}

  was:
{code}
Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, null,
  reporter);
{code}
Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is 
dereferenced:
{code}
boolean isCompressed = conf.getCompressed();
TableDesc tableInfo = conf.getTableInfo();
{code}


 Potential null dereference in ColumnTruncateMapper#jobClose()
 -

 Key: HIVE-8342
 URL: https://issues.apache.org/jira/browse/HIVE-8342
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: skrho
Priority: Minor
 Attachments: HIVE-8342_001.patch, HIVE-8342_002.patch


 {code}
 Utilities.mvFileToFinalPath(outputPath, job, success, LOG, dynPartCtx, 
 null,
   reporter);
 {code}
 Utilities.mvFileToFinalPath() calls createEmptyBuckets() where conf is 
 dereferenced:
 {code}
 boolean isCompressed = conf.getCompressed();
 TableDesc tableInfo = conf.getTableInfo();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-10 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680435#comment-14680435
 ] 

Wei Zheng commented on HIVE-11467:
--

[~sershe] The test failures are due to customized wbsize setting (not power of 
2), and MapJoinBytesTableContainer didn't have this enforcement. Since 
WriteBuffers has a number of consumers, such as MapJoinBytesTableContainer, 
HybridHashTableContainer, VectorMapJoinFastKeyStore and 
VectorMapJoinFastValueStore, I would say we'd better still keep the rounding 
logic in WriteBuffers cstr. What do you think? Hybrid is the only exception 
that it does the rounding by itself.

 WriteBuffers rounding wbSize to next power of 2 may cause OOM
 -

 Key: HIVE-11467
 URL: https://issues.apache.org/jira/browse/HIVE-11467
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
 HIVE-11467.03.patch


 If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
 rounding first to the next power of 2
 {code}
   public WriteBuffers(int wbSize, long maxSize) {
 this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
 (Integer.highestOneBit(wbSize)  1);
 this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
 this.offsetMask = this.wbSize - 1;
 this.maxSize = maxSize;
 writePos.bufferIndex = -1;
 nextBufferToWrite();
   }
 {code}
 That may break existing memory consumption assumption for mapjoin, and 
 potentially cause OOM.
 The solution will be to pass a power of 2 number as wbSize from upstream 
 during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6892) Permission inheritance issues

2015-08-10 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680496#comment-14680496
 ] 

Szehon Ho commented on HIVE-6892:
-

The second point might be a valid change, to mimic the HDFS way instead of 
cloning the extended ACL's.  I dont have bandwidth to make the change at the 
moment, someone else can feel free to take a stab (looks like HIVE-11481).  It 
would be more complex, we would have to traverse the tree and essentially copy 
the HDFS logic for extended ACL for 'default' group.

I have not investigated enough to comment on the first point.

 Permission inheritance issues
 -

 Key: HIVE-6892
 URL: https://issues.apache.org/jira/browse/HIVE-6892
 Project: Hive
  Issue Type: Bug
  Components: Security
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho

 *HDFS Background*
 * When a file or directory is created, its owner is the user identity of the 
 client process, and its group is inherited from parent (the BSD rule).  
 Permissions are taken from default umask.  Extended Acl's are taken from 
 parent unless they are set explicitly.
 *Goals*
 To reduce need to set fine-grain file security props after every operation, 
 users may want the following Hive warehouse file/dir to auto-inherit security 
 properties from their directory parents:
 * Directories created by new database/table/partition/bucket
 * Files added to tables via load/insert
 * Table directories exported/imported  (open question of whether exported 
 table inheriting perm from new parent needs another flag)
 What may be inherited:
 * Basic file permission
 * Groups (already done by HDFS for new directories)
 * Extended ACL's (already done by HDFS for new directories)
 *Behavior*
 * When hive.warehouse.subdir.inherit.perms flag is enabled in Hive, Hive 
 will try to do all above inheritances.  In the future, we can add more flags 
 for more finer-grained control.
 * Failure by Hive to inherit will not cause operation to fail.  Rule of thumb 
 of when security-prop inheritance will happen is the following:
 ** To run chmod, a user must be the owner of the file, or else a super-user.
 ** To run chgrp, a user must be the owner of files, or else a super-user.
 ** Hence, user that hive runs as (either 'hive' or the logged-in user in case 
 of impersonation), must be super-user or owner of the file whose security 
 properties are going to be changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11340) Create ORC based table using like clause doesn't copy compression property

2015-08-10 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680308#comment-14680308
 ] 

Chao Sun commented on HIVE-11340:
-

+1

 Create ORC based table using like clause doesn't copy compression property
 --

 Key: HIVE-11340
 URL: https://issues.apache.org/jira/browse/HIVE-11340
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.14.0, 1.0.0, 1.2.0
Reporter: Gaurav Kohli
Assignee: Yongzhi Chen
Priority: Minor
 Attachments: HIVE-11340.1.patch, HIVE-11340.2.patch


 I found a issue in “create table like” clause, as it is not copying the table 
 properties from ORC File format based table.
 Steps to reproduce:
 Step1 :
 {code}
 create table orc_table (
 time string)
 stored as ORC tblproperties (orc.compress=SNAPPY);
 {code}
 Step 2:
 {code} 
 create table orc_table_using_like like orc_table;
 {code}
 Step 3:
 {code}
 show create table orc_table_using_like;  
 {code}
 Result:
 {code}
 createtab_stmt
 CREATE TABLE `orc_table_using_like`(
   `time` string)
 ROW FORMAT SERDE 
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
 STORED AS INPUTFORMAT 
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
 OUTPUTFORMAT 
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
 LOCATION
   'hdfs://nameservice1/user/hive/warehouse/gkohli.db/orc_table_using_like'
 TBLPROPERTIES (
   'transient_lastDdlTime'='1437578939')
 {code}
 Issue:  'orc.compress'='SNAPPY' property is missing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11171) Join reordering algorithm might introduce projects between joins

2015-08-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11171:
-
Attachment: HIVE-11171.branch-1.patch

Some spark qtest changes I was able to regenerate on branch-1 (it matches with 
the origin master patch). Rest of the tests I was not able to repro. Possibly 
from different jira.

 Join reordering algorithm might introduce projects between joins
 

 Key: HIVE-11171
 URL: https://issues.apache.org/jira/browse/HIVE-11171
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 1.3.0, 2.0.0, 1.2.2

 Attachments: HIVE-11171.01.patch, HIVE-11171.02.patch, 
 HIVE-11171.03.patch, HIVE-11171.5.patch, HIVE-11171.branch-1.patch, 
 HIVE-11171.branch-1.patch, HIVE-11171.patch, HIVE-11171.patch


 Join reordering algorithm might introduce projects between joins which causes 
 multijoin optimization in SemanticAnalyzer to not kick in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

2015-08-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680637#comment-14680637
 ] 

Gopal V commented on HIVE-11502:


[~ychena]: I've linked the issue to the known issue in HADOOP-12217

Is it possible that you're testing hive against different versions of Hadoop 
between 0.13 vs 1.2.?

 Map side aggregation is extremely slow
 --

 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 For the query as following:
 {noformat}
 create table tbl2 as 
 select col1, max(col2) as col2 
 from tbl1 group by col1;
 {noformat}
 If the column for group by has many different values (for example 40) and 
 it is in type double, the map side aggregation is very slow. I ran the query 
 which took more than 3 hours , after 3 hours, I have to kill the query.
 The same query can finish in 7 seconds, if I turn off map side aggregation by:
 {noformat}
 set hive.map.aggr = false;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

2015-08-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680740#comment-14680740
 ] 

Gopal V commented on HIVE-11502:


A custom hashcode can be used internal to Hive (i.e group-by etc), but not 
externally to hive (bucketing into HDFS, results of hash() functions).

Because that would break external assumptions in a non-backwards-compatible way.

The reason shuffle + merge is more uniform is because it starts using [murmur 
hashes|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L366]
 for UNIFORM trait RS instead of the builtin writable hash funcs (which are 
skewed).

You will probably notice that using a vectorized input format like ORC would 
not have the issue you're hitting, since the vector transform inside the 
operator pipeline gives hive the opportunity to use per-operator specific 
optimizations.

 Map side aggregation is extremely slow
 --

 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 For the query as following:
 {noformat}
 create table tbl2 as 
 select col1, max(col2) as col2 
 from tbl1 group by col1;
 {noformat}
 If the column for group by has many different values (for example 40) and 
 it is in type double, the map side aggregation is very slow. I ran the query 
 which took more than 3 hours , after 3 hours, I have to kill the query.
 The same query can finish in 7 seconds, if I turn off map side aggregation by:
 {noformat}
 set hive.map.aggr = false;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680749#comment-14680749
 ] 

Hive QA commented on HIVE-11504:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749615/HIVE-11504.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9348 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.io.parquet.read.TestParquetFilterPredicate.testFilterColumnsThatDoNoExistOnSchema
org.apache.hadoop.hive.ql.io.parquet.read.TestParquetFilterPredicate.testFilterFloatColumns
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4908/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4908/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4908/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749615 - PreCommit-HIVE-TRUNK-Build

 Predicate pushing down doesn't work for float type for Parquet
 --

 Key: HIVE-11504
 URL: https://issues.apache.org/jira/browse/HIVE-11504
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11504.1.patch, HIVE-11504.patch


 Predicate builder should use PrimitiveTypeName type in parquet side to 
 construct predicate leaf instead of the type provided by PredicateLeaf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11511) Output the message of orcfiledump when ORC files are not specified

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680606#comment-14680606
 ] 

Hive QA commented on HIVE-11511:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749598/HIVE-11511.1.patch

{color:green}SUCCESS:{color} +1 9347 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4907/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4907/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4907/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749598 - PreCommit-HIVE-TRUNK-Build

 Output the message of orcfiledump when ORC files are not specified
 --

 Key: HIVE-11511
 URL: https://issues.apache.org/jira/browse/HIVE-11511
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
  Labels: orcfile
 Attachments: HIVE-11511.1.patch


 When I execute the orcfiledump command without specifying a ORC file, any 
 message is not output and return value is 0.
 {code}
 [root@hive hive]# /usr/local/hive/bin/hive --orcfiledump
 [root@hive hive]# echo $?
 0
 {code}
 For this behavior, I will be modified to output a error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11405) Add early termination for recursion in StatsRulesProcFactory$FilterStatsRule.evaluateExpression for OR expression

2015-08-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11405:
-
Attachment: HIVE-11405-branch-1.patch

 Add early termination for recursion in 
 StatsRulesProcFactory$FilterStatsRule.evaluateExpression  for OR expression
 --

 Key: HIVE-11405
 URL: https://issues.apache.org/jira/browse/HIVE-11405
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Prasanth Jayachandran
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11405-branch-1.patch, HIVE-11405.1.patch, 
 HIVE-11405.2.patch, HIVE-11405.2.patch, HIVE-11405.2.patch, 
 HIVE-11405.2.patch, HIVE-11405.patch


 Thanks to [~gopalv] for uncovering this issue as part of HIVE-11330.  Quoting 
 him,
 The recursion protection works well with an AND expr, but it doesn't work 
 against
 (OR a=1 (OR a=2 (OR a=3 (OR ...)
 since the for the rows will never be reduced during recursion due to the 
 nature of the OR.
 We need to execute a short-circuit to satisfy the OR properly - no case which 
 matches a=1 qualifies for the rest of the filters.
 Recursion should pass in the numRows - branch1Rows for the branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11171) Join reordering algorithm might introduce projects between joins

2015-08-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680643#comment-14680643
 ] 

Prasanth Jayachandran commented on HIVE-11171:
--

[~jcamachorodriguez] I reverted the patch and reapplied the new branch-1 that 
contains some spark test diffs.

 Join reordering algorithm might introduce projects between joins
 

 Key: HIVE-11171
 URL: https://issues.apache.org/jira/browse/HIVE-11171
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 1.3.0, 2.0.0, 1.2.2

 Attachments: HIVE-11171.01.patch, HIVE-11171.02.patch, 
 HIVE-11171.03.patch, HIVE-11171.5.patch, HIVE-11171.branch-1.patch, 
 HIVE-11171.branch-1.patch, HIVE-11171.patch, HIVE-11171.patch


 Join reordering algorithm might introduce projects between joins which causes 
 multijoin optimization in SemanticAnalyzer to not kick in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11295) LLAP: clean up ORC dependencies on object pools

2015-08-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680771#comment-14680771
 ] 

Prasanth Jayachandran commented on HIVE-11295:
--

+1

 LLAP: clean up ORC dependencies on object pools
 ---

 Key: HIVE-11295
 URL: https://issues.apache.org/jira/browse/HIVE-11295
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11295.01.patch, HIVE-11295.02.patch, 
 HIVE-11295.patch


 Before there's storage API module, we can clean some things up
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11438) Join a ACID table with non-ACID table fail with MR on 1.0.0

2015-08-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-11438:
--
Attachment: test.log

I cannot get precommit test run against branch-1.0. Run test locally, see 52 
test failures on branch-1.0 even without my patch. With the patch, I get the 
same result. So those failures are not related. Attach test log.

Patch committed to 1.0 branch. 

 Join a ACID table with non-ACID table fail with MR on 1.0.0
 ---

 Key: HIVE-11438
 URL: https://issues.apache.org/jira/browse/HIVE-11438
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Transactions
Affects Versions: 1.0.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 1.0.1

 Attachments: HIVE-11438.1-branch-1.0.patch, HIVE-11438.1.patch, 
 HIVE-11438.2-branch-1.0.patch, test.log


 The following script fail on MR mode:
 Preparation:
 {code}
 CREATE TABLE orc_update_table (k1 INT, f1 STRING, op_code STRING) 
 CLUSTERED BY (k1) INTO 2 BUCKETS 
 STORED AS ORC TBLPROPERTIES(transactional=true); 
 INSERT INTO TABLE orc_update_table VALUES (1, 'a', 'I');
 CREATE TABLE orc_table (k1 INT, f1 STRING) 
 CLUSTERED BY (k1) SORTED BY (k1) INTO 2 BUCKETS 
 STORED AS ORC; 
 INSERT OVERWRITE TABLE orc_table VALUES (1, 'x');
 {code}
 Then run the following script:
 {code}
 SET hive.execution.engine=mr; 
 SET hive.auto.convert.join=false; 
 SET hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
 SELECT t1.*, t2.* FROM orc_table t1 
 JOIN orc_update_table t2 ON t1.k1=t2.k1 ORDER BY t1.k1;
 {code}
 Stack:
 {code}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:265)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:272)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:509)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:585)
   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:580)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:580)
   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:571)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1606)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1367)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1179)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1006)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:996)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 Job Submission failed with exception 

[jira] [Commented] (HIVE-11505) Disabling llap cache allocate direct is not honored anymore

2015-08-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680777#comment-14680777
 ] 

Prasanth Jayachandran commented on HIVE-11505:
--

I ran llap locally with hive.llap.io.cache.direct set to false.

 Disabling llap cache allocate direct is not honored anymore
 ---

 Key: HIVE-11505
 URL: https://issues.apache.org/jira/browse/HIVE-11505
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Sergey Shelukhin

 ORC refactorings probably broke something. I disabled cache direct allocation 
 but still I am getting this exception
 {code}
 Caused by: java.lang.UnsatisfiedLinkError: 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.$$YJP$$init(I)J
   at 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.$$YJP$$init(Native Method)
   at 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(ZlibDecompressor.java)
   at 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(ZlibDecompressor.java:115)
   at 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor$ZlibDirectDecompressor.init(ZlibDecompressor.java:358)
   at 
 org.apache.hadoop.hive.shims.ZeroCopyShims.getDirectDecompressor(ZeroCopyShims.java:114)
   at 
 org.apache.hadoop.hive.shims.Hadoop23Shims.getDirectDecompressor(Hadoop23Shims.java:975)
   at 
 org.apache.hadoop.hive.ql.io.orc.ZlibCodec.directDecompress(ZlibCodec.java:128)
   at 
 org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:84)
   at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.decompressChunk(EncodedReaderImpl.java:1128)
   at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:780)
   at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:467)
   at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:355)
   at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:70)
   at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11511) Output the message of orcfiledump when ORC files are not specified

2015-08-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680840#comment-14680840
 ] 

Alan Gates commented on HIVE-11511:
---

In general looks good.  We should avoid the System.exit call and use return 
instead.  We keep the System.exits out in case we're called by another tool.  I 
can just change that in the patch when I commit it.

 Output the message of orcfiledump when ORC files are not specified
 --

 Key: HIVE-11511
 URL: https://issues.apache.org/jira/browse/HIVE-11511
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
  Labels: orcfile
 Attachments: HIVE-11511.1.patch


 When I execute the orcfiledump command without specifying a ORC file, any 
 message is not output and return value is 0.
 {code}
 [root@hive hive]# /usr/local/hive/bin/hive --orcfiledump
 [root@hive hive]# echo $?
 0
 {code}
 For this behavior, I will be modified to output a error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

2015-08-10 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680783#comment-14680783
 ] 

Yongzhi Chen commented on HIVE-11502:
-

[~gopalv], I have confirmed that HIVE-7041 caused the regression. Because the 
hadoop bug is there for a long time, after hive switch to use hadoop's 
hashcode, we got hadoop's bug. Thanks for find the root cause by pointing the 
hadoop bug.

After I add code in 
serde/src/java/org/apache/hadoop/hive/serde2/io/DoubleWritable.java
{noformat}
   @Override
   public int hashCode() {
 long v = Double.doubleToLongBits(super.get());
 return (int) (v ^ (v  32));
   }
{noformat}
The group by query can finish in 15 seconds. 

So next step is, how do we fix the issue now? 


 Map side aggregation is extremely slow
 --

 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 For the query as following:
 {noformat}
 create table tbl2 as 
 select col1, max(col2) as col2 
 from tbl1 group by col1;
 {noformat}
 If the column for group by has many different values (for example 40) and 
 it is in type double, the map side aggregation is very slow. I ran the query 
 which took more than 3 hours , after 3 hours, I have to kill the query.
 The same query can finish in 7 seconds, if I turn off map side aggregation by:
 {noformat}
 set hive.map.aggr = false;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11295) LLAP: clean up ORC dependencies on object pools

2015-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11295:

Attachment: HIVE-11295.02.patch

Fix a small bug and some bad renames (pascal-case variables due to bulk replace)

 LLAP: clean up ORC dependencies on object pools
 ---

 Key: HIVE-11295
 URL: https://issues.apache.org/jira/browse/HIVE-11295
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11295.01.patch, HIVE-11295.02.patch, 
 HIVE-11295.patch


 Before there's storage API module, we can clean some things up
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11505) Disabling llap cache allocate direct is not honored anymore

2015-08-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680757#comment-14680757
 ] 

Sergey Shelukhin commented on HIVE-11505:
-

I cannot repro this, the test (that relies on non-direct alloc) passes.
Where are you getting this error and how do you disable direct allocation?

 Disabling llap cache allocate direct is not honored anymore
 ---

 Key: HIVE-11505
 URL: https://issues.apache.org/jira/browse/HIVE-11505
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Sergey Shelukhin

 ORC refactorings probably broke something. I disabled cache direct allocation 
 but still I am getting this exception
 {code}
 Caused by: java.lang.UnsatisfiedLinkError: 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.$$YJP$$init(I)J
   at 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.$$YJP$$init(Native Method)
   at 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(ZlibDecompressor.java)
   at 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(ZlibDecompressor.java:115)
   at 
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor$ZlibDirectDecompressor.init(ZlibDecompressor.java:358)
   at 
 org.apache.hadoop.hive.shims.ZeroCopyShims.getDirectDecompressor(ZeroCopyShims.java:114)
   at 
 org.apache.hadoop.hive.shims.Hadoop23Shims.getDirectDecompressor(Hadoop23Shims.java:975)
   at 
 org.apache.hadoop.hive.ql.io.orc.ZlibCodec.directDecompress(ZlibCodec.java:128)
   at 
 org.apache.hadoop.hive.ql.io.orc.ZlibCodec.decompress(ZlibCodec.java:84)
   at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.decompressChunk(EncodedReaderImpl.java:1128)
   at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedStream(EncodedReaderImpl.java:780)
   at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:467)
   at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:355)
   at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:70)
   at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11457) Vectorization: Improve SIMD JIT in GenVectorCode StringExpr instrinsics

2015-08-10 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11457:
---
Fix Version/s: 1.3.0

 Vectorization: Improve SIMD JIT in GenVectorCode StringExpr instrinsics 
 

 Key: HIVE-11457
 URL: https://issues.apache.org/jira/browse/HIVE-11457
 Project: Hive
  Issue Type: Improvement
  Components: Vectorization
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11457.1.patch, HIVE-11457.1.patch, 
 string-intrinsic-sse.png


 With HIVE-11406, the Vectorization codegen generates a new and specialized 
 fast-path for equality (and non equality), which removed the ordering and 
 comparison constraints in the old codepath.
 The equality operation can be much more pipeline and cache line efficient by 
 keeping on comparing even when an inequality has been detected.
 Optimize the single loop into a pair of loops, to allow the Vectorization 
 codegen to use tighter loops that the JIT superword optimization can 
 understand.
 !string-intrinsic-sse.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10289) Support filter on non-first partition key and non-string partition key

2015-08-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680867#comment-14680867
 ] 

Sergey Shelukhin commented on HIVE-10289:
-

Hi. I am not sure what this patch is using (it's too big and no RB or 
description ;)), but HBase has a built-in serialization helper for sorted, 
multi-type keys called OrderedBytes: 
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html 
and HBASE-8201

 Support filter on non-first partition key and non-string partition key
 --

 Key: HIVE-10289
 URL: https://issues.apache.org/jira/browse/HIVE-10289
 Project: Hive
  Issue Type: Sub-task
  Components: HBase Metastore, Metastore
Affects Versions: hbase-metastore-branch
Reporter: Daniel Dai
Assignee: Daniel Dai
 Attachments: HIVE-10289.1.patch


 Currently, partition filtering only handles the first partition key and the 
 type for this partition key must be string. In order to break this 
 limitation, several improvements are required:
 1. Change serialization format for partition key. Currently partition keys 
 are serialized into delimited string, which sorted on string order not with 
 regard to the actual type of the partition key. We use BinarySortableSerDe 
 for this purpose.
 2. For filter condition not on the initial partition keys, push it into HBase 
 RowFilter. RowFilter will deserialize the partition key and evaluate the 
 filter condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9024) NullPointerException when starting webhcat server if templeton.hive.properties is not set

2015-08-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9024:
-
Fix Version/s: 1.0.2

 NullPointerException when starting webhcat server if 
 templeton.hive.properties is not set
 -

 Key: HIVE-9024
 URL: https://issues.apache.org/jira/browse/HIVE-9024
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Na Yang
Assignee: Na Yang
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-9024.patch


 If templeton.hive.properties is not set, when starting webhcat server, the 
 following NullPointerException is thrown and webhcat server could not start:
 {noformat}
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.hiveProps(AppConfig.java:318)
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.handleHiveProperties(AppConfig.java:194)
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:175)
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:155)
 at org.apache.hive.hcatalog.templeton.Main.loadConfig(Main.java:96)
 at org.apache.hive.hcatalog.templeton.Main.init(Main.java:80)
 at org.apache.hive.hcatalog.templeton.Main.init(Main.java:75)
 at org.apache.hive.hcatalog.templeton.Main.main(Main.java:267)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-10289) Support filter on non-first partition key and non-string partition key

2015-08-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680867#comment-14680867
 ] 

Sergey Shelukhin edited comment on HIVE-10289 at 8/10/15 10:18 PM:
---

Hi. I am not sure what this patch is using (it's too big and  description ;)), 
but HBase has a built-in serialization helper for sorted, multi-type keys 
called OrderedBytes: 
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html 
and HBASE-8201
I think we should use that


was (Author: sershe):
Hi. I am not sure what this patch is using (it's too big and no RB or 
description ;)), but HBase has a built-in serialization helper for sorted, 
multi-type keys called OrderedBytes: 
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html 
and HBASE-8201
I think we should use that

 Support filter on non-first partition key and non-string partition key
 --

 Key: HIVE-10289
 URL: https://issues.apache.org/jira/browse/HIVE-10289
 Project: Hive
  Issue Type: Sub-task
  Components: HBase Metastore, Metastore
Affects Versions: hbase-metastore-branch
Reporter: Daniel Dai
Assignee: Daniel Dai
 Attachments: HIVE-10289.1.patch


 Currently, partition filtering only handles the first partition key and the 
 type for this partition key must be string. In order to break this 
 limitation, several improvements are required:
 1. Change serialization format for partition key. Currently partition keys 
 are serialized into delimited string, which sorted on string order not with 
 regard to the actual type of the partition key. We use BinarySortableSerDe 
 for this purpose.
 2. For filter condition not on the initial partition keys, push it into HBase 
 RowFilter. RowFilter will deserialize the partition key and evaluate the 
 filter condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9024) NullPointerException when starting webhcat server if templeton.hive.properties is not set

2015-08-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680869#comment-14680869
 ] 

Jason Dere commented on HIVE-9024:
--

Added this fix to branch-1.0

 NullPointerException when starting webhcat server if 
 templeton.hive.properties is not set
 -

 Key: HIVE-9024
 URL: https://issues.apache.org/jira/browse/HIVE-9024
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Na Yang
Assignee: Na Yang
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-9024.patch


 If templeton.hive.properties is not set, when starting webhcat server, the 
 following NullPointerException is thrown and webhcat server could not start:
 {noformat}
 Exception in thread main java.lang.NullPointerException
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.hiveProps(AppConfig.java:318)
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.handleHiveProperties(AppConfig.java:194)
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:175)
 at 
 org.apache.hive.hcatalog.templeton.AppConfig.init(AppConfig.java:155)
 at org.apache.hive.hcatalog.templeton.Main.loadConfig(Main.java:96)
 at org.apache.hive.hcatalog.templeton.Main.init(Main.java:80)
 at org.apache.hive.hcatalog.templeton.Main.init(Main.java:75)
 at org.apache.hive.hcatalog.templeton.Main.main(Main.java:267)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-10289) Support filter on non-first partition key and non-string partition key

2015-08-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680867#comment-14680867
 ] 

Sergey Shelukhin edited comment on HIVE-10289 at 8/10/15 10:18 PM:
---

Hi. I am not sure what this patch is using (it's too big and no RB or 
description ;)), but HBase has a built-in serialization helper for sorted, 
multi-type keys called OrderedBytes: 
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html 
and HBASE-8201
I think we should use that


was (Author: sershe):
Hi. I am not sure what this patch is using (it's too big and no RB or 
description ;)), but HBase has a built-in serialization helper for sorted, 
multi-type keys called OrderedBytes: 
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html 
and HBASE-8201

 Support filter on non-first partition key and non-string partition key
 --

 Key: HIVE-10289
 URL: https://issues.apache.org/jira/browse/HIVE-10289
 Project: Hive
  Issue Type: Sub-task
  Components: HBase Metastore, Metastore
Affects Versions: hbase-metastore-branch
Reporter: Daniel Dai
Assignee: Daniel Dai
 Attachments: HIVE-10289.1.patch


 Currently, partition filtering only handles the first partition key and the 
 type for this partition key must be string. In order to break this 
 limitation, several improvements are required:
 1. Change serialization format for partition key. Currently partition keys 
 are serialized into delimited string, which sorted on string order not with 
 regard to the actual type of the partition key. We use BinarySortableSerDe 
 for this purpose.
 2. For filter condition not on the initial partition keys, push it into HBase 
 RowFilter. RowFilter will deserialize the partition key and evaluate the 
 filter condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

2015-08-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680870#comment-14680870
 ] 

Gopal V commented on HIVE-11502:


bq. So next step is, how do we fix the issue now?

Easiest would be to use vectorization, which doesn't need any Writables in the 
inner loop.

The vector hashcode for doubles would automatically be very similar to your 
impl (from Arrays.hashCode(double[]))

{code}

for (double element : a) {
long bits = Double.doubleToLongBits(element);
result = 31 * result + (int)(bits ^ (bits  32));
}
return result;
{code}

 Map side aggregation is extremely slow
 --

 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 For the query as following:
 {noformat}
 create table tbl2 as 
 select col1, max(col2) as col2 
 from tbl1 group by col1;
 {noformat}
 If the column for group by has many different values (for example 40) and 
 it is in type double, the map side aggregation is very slow. I ran the query 
 which took more than 3 hours , after 3 hours, I have to kill the query.
 The same query can finish in 7 seconds, if I turn off map side aggregation by:
 {noformat}
 set hive.map.aggr = false;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11398) Parse wide OR and wide AND trees to flat OR/AND trees

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680884#comment-14680884
 ] 

Hive QA commented on HIVE-11398:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749637/HIVE-11398.5.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9305 tests executed
*Failed tests:*
{noformat}
TestMiniSparkOnYarnCliDriver - did not produce a TEST-*.xml file
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4909/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4909/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4909/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749637 - PreCommit-HIVE-TRUNK-Build

 Parse wide OR and wide AND trees to flat OR/AND trees
 -

 Key: HIVE-11398
 URL: https://issues.apache.org/jira/browse/HIVE-11398
 Project: Hive
  Issue Type: New Feature
  Components: Logical Optimizer, UDF
Affects Versions: 1.3.0, 2.0.0
Reporter: Gopal V
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11398.2.patch, HIVE-11398.3.patch, 
 HIVE-11398.4.patch, HIVE-11398.5.patch, HIVE-11398.patch


 Deep trees of AND/OR are hard to traverse particularly when they are merely 
 the same structure in nested form as a version of the operator that takes an 
 arbitrary number of args.
 One potential way to convert the DFS searches into a simpler BFS search is to 
 introduce a new Operator pair named ALL and ANY.
 ALL(A, B, C, D, E) represents AND(AND(AND(AND(E, D), C), B), A)
 ANY(A, B, C, D, E) represents OR(OR(OR(OR(E, D), C),B),A)
 The SemanticAnalyser would be responsible for generating these operators and 
 this would mean that the depth and complexity of traversals for the simplest 
 case of wide AND/OR trees would be trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8680) Set Max Message for Binary Thrift endpoints

2015-08-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8680:
-
Fix Version/s: 1.0.2

 Set Max Message for Binary Thrift endpoints
 ---

 Key: HIVE-8680
 URL: https://issues.apache.org/jira/browse/HIVE-8680
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
  Labels: TODOC15
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-8680.patch, HIVE-8680.patch


 Thrift has a configuration open to restrict incoming message size. If we 
 configure this we'll stop OOM'ing when someone sends us an HTTP request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7997) Potential null pointer reference in ObjectInspectorUtils#compareTypes()

2015-08-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680919#comment-14680919
 ] 

Jason Dere commented on HIVE-7997:
--

Adding fix to branch-1.0

 Potential null pointer reference in ObjectInspectorUtils#compareTypes()
 ---

 Key: HIVE-7997
 URL: https://issues.apache.org/jira/browse/HIVE-7997
 Project: Hive
  Issue Type: Bug
  Components: Types
Reporter: Ted Yu
Assignee: Navis
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-7997.1.patch.txt


 {code}
   if (childFieldsList1 == null  childFieldsList2 == null) {
 return true;
   }
   if (childFieldsList1.size() != childFieldsList2.size()) {
 return false;
   }
 {code}
 If either childFieldsList1 or childFieldsList2 is null but not both, the 
 second if statement would produce NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680993#comment-14680993
 ] 

Hive QA commented on HIVE-11387:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749640/HIVE-11387.07.patch

{color:green}SUCCESS:{color} +1 9347 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4910/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4910/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4910/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749640 - PreCommit-HIVE-TRUNK-Build

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
 reduce_deduplicate optimization
 --

 Key: HIVE-11387
 URL: https://issues.apache.org/jira/browse/HIVE-11387
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
 HIVE-11387.03.patch, HIVE-11387.04.patch, HIVE-11387.05.patch, 
 HIVE-11387.06.patch, HIVE-11387.07.patch


 The main problem is that, due to return path, now we may have 
 {{(RS1-GBY2)\-(RS3-GBY4)}} when map.aggr=false, i.e., no map aggr. However, 
 in the non-return path, it will be treated as {{(RS1)-(GBY2-RS3-GBY4)}}. The 
 main problem is that it does not take into account of the setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10289) Support filter on non-first partition key and non-string partition key

2015-08-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680877#comment-14680877
 ] 

Sergey Shelukhin commented on HIVE-10289:
-

although I guess BSSD might be ok too...

 Support filter on non-first partition key and non-string partition key
 --

 Key: HIVE-10289
 URL: https://issues.apache.org/jira/browse/HIVE-10289
 Project: Hive
  Issue Type: Sub-task
  Components: HBase Metastore, Metastore
Affects Versions: hbase-metastore-branch
Reporter: Daniel Dai
Assignee: Daniel Dai
 Attachments: HIVE-10289.1.patch


 Currently, partition filtering only handles the first partition key and the 
 type for this partition key must be string. In order to break this 
 limitation, several improvements are required:
 1. Change serialization format for partition key. Currently partition keys 
 are serialized into delimited string, which sorted on string order not with 
 regard to the actual type of the partition key. We use BinarySortableSerDe 
 for this purpose.
 2. For filter condition not on the initial partition keys, push it into HBase 
 RowFilter. RowFilter will deserialize the partition key and evaluate the 
 filter condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7271) Speed up unit tests

2015-08-10 Thread Shannon Ladymon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shannon Ladymon updated HIVE-7271:
--
Labels:   (was: TODOC14)

 Speed up unit tests
 ---

 Key: HIVE-7271
 URL: https://issues.apache.org/jira/browse/HIVE-7271
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: 0.14.0

 Attachments: HIVE-7271.1.patch, HIVE-7271.2.patch, HIVE-7271.3.patch, 
 HIVE-7271.4.patch, HIVE-7271.5.patch, HIVE-7271.6.patch, HIVE-7271.7.patch


 Did some experiments to see if there's a way to speed up unit tests. 
 TestCliDriver seemed to take a lot of time just spinning up/tearing down 
 JVMs. I was also curious to see if running everything on a ram disk would 
 help.
 Results (I ran tests up to authorization_2):
 - Current setup: 40 minutes
 - Single JVM (not using child JVM to run all queries): 8 minutes
 - Single JVM + ram disk: 7 minutes
 So the ram disk didn't help that much. But running tests in single JVM seems 
 worthwhile doing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8518) Compile time skew join optimization returns duplicated results

2015-08-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680886#comment-14680886
 ] 

Jason Dere commented on HIVE-8518:
--

Included this fix to branch-1.0

 Compile time skew join optimization returns duplicated results
 --

 Key: HIVE-8518
 URL: https://issues.apache.org/jira/browse/HIVE-8518
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.14.0
Reporter: Rui Li
Assignee: Rui Li
 Fix For: 1.1.0

 Attachments: HIVE-8518.1.patch


 Compile time skew join optimization clones the join operator tree and unions 
 the results.
 The problem here is that we don't properly insert the predicate for the 
 cloned join (relying on an assert statement).
 To reproduce the issue, run the simple query:
 {code}select * from tbl1 join tbl2 on tbl1.key=tbl2.key;{code}
 And suppose there's some skew in tbl1 (specify skew with CREATE or ALTER 
 statement).
 Duplicated results will be returned if you set 
 hive.optimize.skewjoin.compiletime=true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10062) HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data

2015-08-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10062:
---
Attachment: HIVE-10062.branch-1.patch

 HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
 -

 Key: HIVE-10062
 URL: https://issues.apache.org/jira/browse/HIVE-10062
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
Priority: Critical
 Fix For: 1.2.0

 Attachments: HIVE-10062.01.patch, HIVE-10062.02.patch, 
 HIVE-10062.03.patch, HIVE-10062.04.patch, HIVE-10062.05.patch, 
 HIVE-10062.branch-1.patch


 In q.test environment with src table, execute the following query: 
 {code}
 CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE;
 CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE;
 FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
  UNION all 
   select s2.key as key, s2.value as value from src s2) unionsrc
 INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT 
 SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key
 INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, 
 COUNT(DISTINCT SUBSTR(unionsrc.value,5)) 
 GROUP BY unionsrc.key, unionsrc.value;
 select * from DEST1;
 select * from DEST2;
 {code}
 DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row 
 tst1500 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11514) Vectorized version of auto_sortmerge_join_1.q fails during execution with NPE

2015-08-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-11514:

Attachment: auto_sortmerge_join_1.q

 Vectorized version of auto_sortmerge_join_1.q fails during execution with NPE
 -

 Key: HIVE-11514
 URL: https://issues.apache.org/jira/browse/HIVE-11514
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: auto_sortmerge_join_1.q


 Query from auto_sortmerge_join_1.q:
 {code}
 select count(*) FROM bucket_big a JOIN bucket_small b ON a.key = b.key
 {code}
 generates stack trace:
 {code}
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapJoinOperator.initializeOp(VectorMapJoinOperator.java:177)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:362)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:481)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:438)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:131)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7997) Potential null pointer reference in ObjectInspectorUtils#compareTypes()

2015-08-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7997:
-
Fix Version/s: 1.0.2

 Potential null pointer reference in ObjectInspectorUtils#compareTypes()
 ---

 Key: HIVE-7997
 URL: https://issues.apache.org/jira/browse/HIVE-7997
 Project: Hive
  Issue Type: Bug
  Components: Types
Reporter: Ted Yu
Assignee: Navis
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-7997.1.patch.txt


 {code}
   if (childFieldsList1 == null  childFieldsList2 == null) {
 return true;
   }
   if (childFieldsList1.size() != childFieldsList2.size()) {
 return false;
   }
 {code}
 If either childFieldsList1 or childFieldsList2 is null but not both, the 
 second if statement would produce NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8889) JDBC Driver ResultSet.getXXXXXX(String columnLabel) methods Broken

2015-08-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8889:
-
Fix Version/s: 1.0.2

Included this fix to branch-1.0

 JDBC Driver ResultSet.getXX(String columnLabel) methods Broken
 --

 Key: HIVE-8889
 URL: https://issues.apache.org/jira/browse/HIVE-8889
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: G Lingle
Assignee: Chaoyu Tang
Priority: Critical
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-8889.1.patch, HIVE-8889.2.patch, HIVE-8889.patch


 Using hive-jdbc-0.13.1-cdh5.2.0.jar.
 All of the get-by-column-label methods of HiveBaseResultSet are now broken.  
 They don't take just the column label as they should.  Instead you have to 
 pass in table name.column name.  This requirement doesn't conform to the 
 java ResultSet API which specifies:
 columnLabel - the label for the column specified with the SQL AS clause. If 
 the SQL AS clause was not specified, then the label is the name of the column
 Looking at the code, it seems that the problem is that findColumn() method is 
 looking in normalizedColumnNames instead of the columnNames.
 BTW, Another annoying issue with the code is that the SQLException thrown 
 gives no indication of what the problem is.  It should at least say that the 
 column name wasn't found in the description string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11500:

Description: 
We need to cache file metadata (e.g. ORC file footers) for split generation 
(which, on FSes that support fileId, will be valid permanently and only needs 
to be removed lazily when ORC file is erased or compacted), and potentially 
even some information about splits (e.g. grouping based on location that would 
be good for some short time), in HBase metastore.
-It should be queryable by table. Partition predicate pushdown should be 
supported. If bucket pruning is added, that too.- 

In later phases, it would be nice to save the (first category above) results of 
expensive work done by jobs, e.g. data size after decompression/decoding per 
column, etc. to avoid surprises when ORC encoding is very good, or very bad. 
Perhaps it can even be lazily generated. Here's a pony: 

  was:
We need to cache file metadata (e.g. ORC file footers) for split generation 
(which, on FSes that support fileId, will be valid permanently and only needs 
to be removed lazily when ORC file is erased or compacted), and potentially 
even some information about splits (e.g. grouping based on location that would 
be good for some short time), in HBase metastore.
It should be queryable by table. Partition predicate pushdown should be 
supported. If bucket pruning is added, that too. 

In later phases, it would be nice to save the (first category above) results of 
expensive work done by jobs, e.g. data size after decompression/decoding per 
column, etc. to avoid surprises when ORC encoding is very good, or very bad. 
Perhaps it can even be lazily generated. Here's a pony: 


 implement file footer / splits cache in HBase metastore
 ---

 Key: HIVE-11500
 URL: https://issues.apache.org/jira/browse/HIVE-11500
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 We need to cache file metadata (e.g. ORC file footers) for split generation 
 (which, on FSes that support fileId, will be valid permanently and only needs 
 to be removed lazily when ORC file is erased or compacted), and potentially 
 even some information about splits (e.g. grouping based on location that 
 would be good for some short time), in HBase metastore.
 -It should be queryable by table. Partition predicate pushdown should be 
 supported. If bucket pruning is added, that too.- 
 In later phases, it would be nice to save the (first category above) results 
 of expensive work done by jobs, e.g. data size after decompression/decoding 
 per column, etc. to avoid surprises when ORC encoding is very good, or very 
 bad. Perhaps it can even be lazily generated. Here's a pony: 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11500:

Description: 
We need to cache file metadata (e.g. ORC file footers) for split generation 
(which, on FSes that support fileId, will be valid permanently and only needs 
to be removed lazily when ORC file is erased or compacted), and potentially 
even some information about splits (e.g. grouping based on location that would 
be good for some short time), in HBase metastore.
-It should be queryable by table. Partition predicate pushdown should be 
supported. If bucket pruning is added, that too.- Given that we cannot cache 
file lists (we have to check FS for new/changed files anyway), and the 
difficulty of passing of data about partitions/etc. to split generation 
compared to paths, we will probably just filter by paths and fileIds. It might 
be different for splits

In later phases, it would be nice to save the (first category above) results of 
expensive work done by jobs, e.g. data size after decompression/decoding per 
column, etc. to avoid surprises when ORC encoding is very good, or very bad. 
Perhaps it can even be lazily generated. Here's a pony: 

  was:
We need to cache file metadata (e.g. ORC file footers) for split generation 
(which, on FSes that support fileId, will be valid permanently and only needs 
to be removed lazily when ORC file is erased or compacted), and potentially 
even some information about splits (e.g. grouping based on location that would 
be good for some short time), in HBase metastore.
-It should be queryable by table. Partition predicate pushdown should be 
supported. If bucket pruning is added, that too.- Given that we cannot cache 
file lists (we have to check FS for new/changed files anyway), and the 
difficulty of passing of data about partitions/etc. to split generation 
compared to paths, we will probably just filter by fileId

In later phases, it would be nice to save the (first category above) results of 
expensive work done by jobs, e.g. data size after decompression/decoding per 
column, etc. to avoid surprises when ORC encoding is very good, or very bad. 
Perhaps it can even be lazily generated. Here's a pony: 


 implement file footer / splits cache in HBase metastore
 ---

 Key: HIVE-11500
 URL: https://issues.apache.org/jira/browse/HIVE-11500
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 We need to cache file metadata (e.g. ORC file footers) for split generation 
 (which, on FSes that support fileId, will be valid permanently and only needs 
 to be removed lazily when ORC file is erased or compacted), and potentially 
 even some information about splits (e.g. grouping based on location that 
 would be good for some short time), in HBase metastore.
 -It should be queryable by table. Partition predicate pushdown should be 
 supported. If bucket pruning is added, that too.- Given that we cannot cache 
 file lists (we have to check FS for new/changed files anyway), and the 
 difficulty of passing of data about partitions/etc. to split generation 
 compared to paths, we will probably just filter by paths and fileIds. It 
 might be different for splits
 In later phases, it would be nice to save the (first category above) results 
 of expensive work done by jobs, e.g. data size after decompression/decoding 
 per column, etc. to avoid surprises when ORC encoding is very good, or very 
 bad. Perhaps it can even be lazily generated. Here's a pony: 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8874) Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster

2015-08-10 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680940#comment-14680940
 ] 

Jason Dere commented on HIVE-8874:
--

Included this fix to branch-1.0

 Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster
 ---

 Key: HIVE-8874
 URL: https://issues.apache.org/jira/browse/HIVE-8874
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-8874.1.patch


 A Hive action workflow on a secure cluster, that does an INSERT INTO regular 
 table FROM hbase table as part of its script will reproduce the issue. And 
 it can be reproduced in Hive 0.13 cluster. 
 {noformat}
 10309 [main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: 
 SemanticException Error while configuring input job properties
   org.apache.hadoop.hive.ql.parse.SemanticException: Error while configuring 
 input job properties
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:94)
   at 
 org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9261)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:206)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:332)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:988)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1053)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:924)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:914)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
   at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323)
   at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284)
   at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
   at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
   Caused by: java.lang.IllegalStateException: Error while configuring input 
 job properties
   at 
 org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:343)
   at 
 org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:279)
   at 
 org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:804)
   at 
 org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:774)
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.convertToWork(SimpleFetchOptimizer.java:241)
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.access$000(SimpleFetchOptimizer.java:207)
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.optimize(SimpleFetchOptimizer.java:112)
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:83)
   ... 35 more
   Caused by: org.apache.hadoop.hbase.security.AccessDeniedException: 
 org.apache.hadoop.hbase.security.AccessDeniedException: 

[jira] [Updated] (HIVE-8874) Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster

2015-08-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8874:
-
Fix Version/s: 1.0.2

 Error Accessing HBase from Hive via Oozie on Kerberos 5.0.1 cluster
 ---

 Key: HIVE-8874
 URL: https://issues.apache.org/jira/browse/HIVE-8874
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-8874.1.patch


 A Hive action workflow on a secure cluster, that does an INSERT INTO regular 
 table FROM hbase table as part of its script will reproduce the issue. And 
 it can be reproduced in Hive 0.13 cluster. 
 {noformat}
 10309 [main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: 
 SemanticException Error while configuring input job properties
   org.apache.hadoop.hive.ql.parse.SemanticException: Error while configuring 
 input job properties
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:94)
   at 
 org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:146)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9261)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:206)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:332)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:988)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1053)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:924)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:914)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:269)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:221)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:431)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:367)
   at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:464)
   at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:474)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
   at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:323)
   at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:284)
   at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
   at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
   Caused by: java.lang.IllegalStateException: Error while configuring input 
 job properties
   at 
 org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:343)
   at 
 org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:279)
   at 
 org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:804)
   at 
 org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:774)
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.convertToWork(SimpleFetchOptimizer.java:241)
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer$FetchData.access$000(SimpleFetchOptimizer.java:207)
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.optimize(SimpleFetchOptimizer.java:112)
   at 
 org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:83)
   ... 35 more
   Caused by: org.apache.hadoop.hbase.security.AccessDeniedException: 
 org.apache.hadoop.hbase.security.AccessDeniedException: Token generation only 
 allowed for Kerberos authenticated clients
  

[jira] [Updated] (HIVE-8330) HiveResultSet.findColumn() parameters are case sensitive

2015-08-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8330:
-
Fix Version/s: 1.0.2

Included this fix to branch-1.0

 HiveResultSet.findColumn() parameters are case sensitive
 

 Key: HIVE-8330
 URL: https://issues.apache.org/jira/browse/HIVE-8330
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Sergio Peña
Assignee: Sergio Peña
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-8330.1.patch, HIVE-8330.2.patch, HIVE-8330.3.patch, 
 HIVE-8330.4.patch


 Look at the following code:
 {noformat}
 Class.forName(org.apache.hive.jdbc.HiveDriver);
 Connection db = null;
 Statement stmt = null;
 ResultSet rs = null;
 try {
 db = 
 DriverManager.getConnection(jdbc:hive2://localhost:1/default, hive, 
 );
 stmt = db.createStatement();
 rs = stmt.executeQuery(SELECT * FROM sample_07 limit 1);
 ResultSetMetaData metaData = rs.getMetaData();
 for (int i = 1; i = metaData.getColumnCount(); i++) {
 System.out.println(Column  + i + :  + 
 metaData.getColumnName(i));
 }
 while (rs.next()) {
 System.out.println(rs.findColumn(code));
 }
 } finally {
 DbUtils.closeQuietly(db, stmt, rs);
 }
 {noformat}
 Above program will generate following result on my cluster:
 {noformat}
 Column 1: code
 Column 2: description
 Column 3: total_emp
 Column 4: salary
 1
 {noformat}
 However, if the last print sentence is changed as following (using uppercase 
 characters):
 {noformat}
 System.out.println(rs.findColumn(Code));
 {noformat}
 The program will fail at exactly that line. The same happens if the column 
 name is changed as CODE
 Based on the JDBC ResultSet documentation, this method should be case 
 insensitive.
 Column names used as input to getter methods are case insensitive
 http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11513) AvroLazyObjectInspector could handle empty data better

2015-08-10 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-11513:

Attachment: HIVE-11513.1.patch.txt

RB: https://reviews.apache.org/r/37329/

 AvroLazyObjectInspector could handle empty data better
 --

 Key: HIVE-11513
 URL: https://issues.apache.org/jira/browse/HIVE-11513
 Project: Hive
  Issue Type: Improvement
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
 Attachments: HIVE-11513.1.patch.txt


 Currently in the AvroLazyObjectInspector, it looks like we only handle the 
 case  when the data send to deserialize is null[1]. It would be nice to 
 handle the case when it is empty.
 [1] 
 https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L226-L228



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11295) LLAP: clean up ORC dependencies on object pools

2015-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11295:

Attachment: HIVE-11295.01.patch

Fixed method names

 LLAP: clean up ORC dependencies on object pools
 ---

 Key: HIVE-11295
 URL: https://issues.apache.org/jira/browse/HIVE-11295
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11295.01.patch, HIVE-11295.patch


 Before there's storage API module, we can clean some things up



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11295) LLAP: clean up ORC dependencies on object pools

2015-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11295:

Description: 
Before there's storage API module, we can clean some things up

NO PRECOMMIT TESTS

  was:Before there's storage API module, we can clean some things up


 LLAP: clean up ORC dependencies on object pools
 ---

 Key: HIVE-11295
 URL: https://issues.apache.org/jira/browse/HIVE-11295
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11295.01.patch, HIVE-11295.patch


 Before there's storage API module, we can clean some things up
 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7214) Support predicate pushdown for complex data types in ORCFile

2015-08-10 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7214:
-
Labels: ORC  (was: )

 Support predicate pushdown for complex data types in ORCFile
 

 Key: HIVE-7214
 URL: https://issues.apache.org/jira/browse/HIVE-7214
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Rohini Palaniswamy
  Labels: ORC

 Currently ORCFile does not support predicate pushdown for complex datatypes 
 like map, array and struct while Parquet does. Came across this during 
 discussion of PIG-3760. Our users have a lot of map and struct (tuple in pig) 
 columns and most of the filter conditions are on them. Would be great to have 
 support added for them in ORC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7214) Support predicate pushdown for complex data types in ORCFile

2015-08-10 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-7214:
-
Component/s: File Formats

 Support predicate pushdown for complex data types in ORCFile
 

 Key: HIVE-7214
 URL: https://issues.apache.org/jira/browse/HIVE-7214
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Rohini Palaniswamy

 Currently ORCFile does not support predicate pushdown for complex datatypes 
 like map, array and struct while Parquet does. Came across this during 
 discussion of PIG-3760. Our users have a lot of map and struct (tuple in pig) 
 columns and most of the filter conditions are on them. Would be great to have 
 support added for them in ORC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11493) Predicate with integer column equals double evaluates to false

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681178#comment-14681178
 ] 

Hive QA commented on HIVE-11493:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749659/HIVE-11493.02.patch

{color:green}SUCCESS:{color} +1 9348 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4912/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4912/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4912/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749659 - PreCommit-HIVE-TRUNK-Build

 Predicate with integer column equals double evaluates to false
 --

 Key: HIVE-11493
 URL: https://issues.apache.org/jira/browse/HIVE-11493
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Pengcheng Xiong
Priority: Blocker
 Attachments: HIVE-11493.01.patch, HIVE-11493.02.patch


 Filters with integer column equals double constant evaluates to false 
 everytime. Negative double constant works fine.
 {code:title=explain select * from orc_ppd where t = 10.0;}
 OK
 Stage-0
Fetch Operator
   limit:-1
   Select Operator [SEL_2]
  
 outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13]
  Filter Operator [FIL_1]
 predicate:false (type: boolean)
 TableScan [TS_0]
alias:orc_ppd
 {code}
 {code:title=explain select * from orc_ppd where t = -10.0;}
 OK
 Stage-0
Fetch Operator
   limit:-1
   Select Operator [SEL_2]
  
 outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13]
  Filter Operator [FIL_1]
 predicate:(t = (- 10.0)) (type: boolean)
 TableScan [TS_0]
alias:orc_ppd
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-10 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681180#comment-14681180
 ] 

Dong Chen commented on HIVE-11498:
--

[~dapengsun] Thanks for your contribution! I have commit this to master, 
branch-1, and branch-1.2.

 HIVE Authorization v2 should not check permission for dummy entity
 --

 Key: HIVE-11498
 URL: https://issues.apache.org/jira/browse/HIVE-11498
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 1.2.0, 1.3.0, 2.0.0
Reporter: Dapeng Sun
Assignee: Dapeng Sun
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, 
 HIVE-11498.003.patch


 The queries like {{SELECT 1+1;}}, The target table and database will set to 
 {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds 
 of databases or tables.
 For authz v1. it has skip them.
 eg1. [Source code at 
 github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600]
 {noformat}
 for (WriteEntity write : outputs) {
 if (write.isDummy() || write.isPathType()) {
   continue;
 }
 {noformat}
 eg2. [Source code at 
 github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633]
 {noformat}
 for (ReadEntity read : inputs) {
 if (read.isDummy() || read.isPathType()) {
   continue;
 }
...
 }
 {noformat}
 ...
 This patch will fix authz v2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11493) Predicate with integer column equals double evaluates to false

2015-08-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681182#comment-14681182
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11493:
--

+1 for patch 2.

 Predicate with integer column equals double evaluates to false
 --

 Key: HIVE-11493
 URL: https://issues.apache.org/jira/browse/HIVE-11493
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Pengcheng Xiong
Priority: Blocker
 Attachments: HIVE-11493.01.patch, HIVE-11493.02.patch


 Filters with integer column equals double constant evaluates to false 
 everytime. Negative double constant works fine.
 {code:title=explain select * from orc_ppd where t = 10.0;}
 OK
 Stage-0
Fetch Operator
   limit:-1
   Select Operator [SEL_2]
  
 outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13]
  Filter Operator [FIL_1]
 predicate:false (type: boolean)
 TableScan [TS_0]
alias:orc_ppd
 {code}
 {code:title=explain select * from orc_ppd where t = -10.0;}
 OK
 Stage-0
Fetch Operator
   limit:-1
   Select Operator [SEL_2]
  
 outputColumnNames:[_col0,_col1,_col2,_col3,_col4,_col5,_col6,_col7,_col8,_col9,_col10,_col11,_col12,_col13]
  Filter Operator [FIL_1]
 predicate:(t = (- 10.0)) (type: boolean)
 TableScan [TS_0]
alias:orc_ppd
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-10 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681187#comment-14681187
 ] 

Dong Chen commented on HIVE-11498:
--

Thanks for your review on this patch! [~thejas]

 HIVE Authorization v2 should not check permission for dummy entity
 --

 Key: HIVE-11498
 URL: https://issues.apache.org/jira/browse/HIVE-11498
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 1.2.0, 1.3.0, 2.0.0
Reporter: Dapeng Sun
Assignee: Dapeng Sun
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, 
 HIVE-11498.003.patch


 The queries like {{SELECT 1+1;}}, The target table and database will set to 
 {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds 
 of databases or tables.
 For authz v1. it has skip them.
 eg1. [Source code at 
 github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600]
 {noformat}
 for (WriteEntity write : outputs) {
 if (write.isDummy() || write.isPathType()) {
   continue;
 }
 {noformat}
 eg2. [Source code at 
 github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633]
 {noformat}
 for (ReadEntity read : inputs) {
 if (read.isDummy() || read.isPathType()) {
   continue;
 }
...
 }
 {noformat}
 ...
 This patch will fix authz v2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11462) GenericUDFStruct should constant fold at compile time

2015-08-10 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11462:
---
Attachment: HIVE-11462.3.patch

To workaround Kryo StdInstantiatorStrategy issues, prevent patch from folding 
deeper than 1 level.

Updated golden files.


 GenericUDFStruct should constant fold at compile time
 -

 Key: HIVE-11462
 URL: https://issues.apache.org/jira/browse/HIVE-11462
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 2.0.0
Reporter: Gopal V
Assignee: Gopal V
 Attachments: HIVE-11462.1.patch, HIVE-11462.2.patch, 
 HIVE-11462.3.patch, HIVE-11462.WIP.patch


 HIVE-11428 introduces a constant Struct Object, which is available for the 
 runtime operators to assume as a constant parameter.
 This operator isn't constant folded during compilation since the UDF returns 
 a complex type, which is logged as warning by the constant propogation layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-10 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681190#comment-14681190
 ] 

Dapeng Sun commented on HIVE-11498:
---

Thank [~thejas] and [~dongc] for your review.

 HIVE Authorization v2 should not check permission for dummy entity
 --

 Key: HIVE-11498
 URL: https://issues.apache.org/jira/browse/HIVE-11498
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 1.2.0, 1.3.0, 2.0.0
Reporter: Dapeng Sun
Assignee: Dapeng Sun
 Fix For: 1.3.0, 1.2.1, 2.0.0

 Attachments: HIVE-11498.001.patch, HIVE-11498.002.patch, 
 HIVE-11498.003.patch


 The queries like {{SELECT 1+1;}}, The target table and database will set to 
 {{_dummy_database}} {{_dummy_table}}, authorization should skip these kinds 
 of databases or tables.
 For authz v1. it has skip them.
 eg1. [Source code at 
 github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L600]
 {noformat}
 for (WriteEntity write : outputs) {
 if (write.isDummy() || write.isPathType()) {
   continue;
 }
 {noformat}
 eg2. [Source code at 
 github|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L633]
 {noformat}
 for (ReadEntity read : inputs) {
 if (read.isDummy() || read.isPathType()) {
   continue;
 }
...
 }
 {noformat}
 ...
 This patch will fix authz v2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11461) Transform flat AND/OR into IN struct clause

2015-08-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680651#comment-14680651
 ] 

Gopal V commented on HIVE-11461:


[~jcamachorodriguez]: the PreOrderOnceWalker improves the performance of the 
optimizer significantly. Patch LGTM - the early exit makes it fast for the miss 
cases as well.

+1 to the patch, golden file updates after HIVE-11398 goes in.

 Transform flat AND/OR into IN struct clause
 ---

 Key: HIVE-11461
 URL: https://issues.apache.org/jira/browse/HIVE-11461
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11461.1.patch, HIVE-11461.2.patch, HIVE-11461.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

2015-08-10 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680720#comment-14680720
 ] 

Yongzhi Chen commented on HIVE-11502:
-

[~gopalv], I checked the related hadoop code between two versions used by 0.13 
and 1.2, there is no change in hadoop side for DoubleWritable. 
I think the regression may relate to HIVE-7041 which switch from using hive's 
own DoubleWritable to hadoop's . But just revert the change cause exceptions, I 
am still looking at it. 

 Map side aggregation is extremely slow
 --

 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 For the query as following:
 {noformat}
 create table tbl2 as 
 select col1, max(col2) as col2 
 from tbl1 group by col1;
 {noformat}
 If the column for group by has many different values (for example 40) and 
 it is in type double, the map side aggregation is very slow. I ran the query 
 which took more than 3 hours , after 3 hours, I have to kill the query.
 The same query can finish in 7 seconds, if I turn off map side aggregation by:
 {noformat}
 set hive.map.aggr = false;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11467) WriteBuffers rounding wbSize to next power of 2 may cause OOM

2015-08-10 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11467:
-
Attachment: HIVE-11467.04.patch

OK, updated the patch.

 WriteBuffers rounding wbSize to next power of 2 may cause OOM
 -

 Key: HIVE-11467
 URL: https://issues.apache.org/jira/browse/HIVE-11467
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0, 2.0.0
Reporter: Wei Zheng
Assignee: Wei Zheng
 Attachments: HIVE-11467.01.patch, HIVE-11467.02.patch, 
 HIVE-11467.03.patch, HIVE-11467.04.patch


 If wbSize passed to WriteBuffers cstr is not power of 2, it will do a 
 rounding first to the next power of 2
 {code}
   public WriteBuffers(int wbSize, long maxSize) {
 this.wbSize = Integer.bitCount(wbSize) == 1 ? wbSize : 
 (Integer.highestOneBit(wbSize)  1);
 this.wbSizeLog2 = 31 - Integer.numberOfLeadingZeros(this.wbSize);
 this.offsetMask = this.wbSize - 1;
 this.maxSize = maxSize;
 writePos.bufferIndex = -1;
 nextBufferToWrite();
   }
 {code}
 That may break existing memory consumption assumption for mapjoin, and 
 potentially cause OOM.
 The solution will be to pass a power of 2 number as wbSize from upstream 
 during hashtable creation, to avoid this late expansion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7224) Set incremental printing to true by default in Beeline

2015-08-10 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681110#comment-14681110
 ] 

Thejas M Nair commented on HIVE-7224:
-

[~vgumashta] can you please rebase ?


 Set incremental printing to true by default in Beeline
 --

 Key: HIVE-7224
 URL: https://issues.apache.org/jira/browse/HIVE-7224
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.13.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
  Labels: TODOC1.2
 Attachments: HIVE-7224.1.patch


 See HIVE-7221.
 By default beeline tries to buffer the entire output relation before printing 
 it on stdout. This can cause OOM when the output relation is large. However, 
 beeline has the option of incremental prints. We should keep that as the 
 default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7224) Set incremental printing to true by default in Beeline

2015-08-10 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7224:
--
Affects Version/s: 1.1.0
   1.0.0
   1.2.0

 Set incremental printing to true by default in Beeline
 --

 Key: HIVE-7224
 URL: https://issues.apache.org/jira/browse/HIVE-7224
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-7224.1.patch


 See HIVE-7221.
 By default beeline tries to buffer the entire output relation before printing 
 it on stdout. This can cause OOM when the output relation is large. However, 
 beeline has the option of incremental prints. We should keep that as the 
 default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7224) Set incremental printing to true by default in Beeline

2015-08-10 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7224:
--
Labels:   (was: TODOC1.2)

 Set incremental printing to true by default in Beeline
 --

 Key: HIVE-7224
 URL: https://issues.apache.org/jira/browse/HIVE-7224
 Project: Hive
  Issue Type: Bug
  Components: Clients, JDBC
Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-7224.1.patch


 See HIVE-7221.
 By default beeline tries to buffer the entire output relation before printing 
 it on stdout. This can cause OOM when the output relation is large. However, 
 beeline has the option of incremental prints. We should keep that as the 
 default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11500:

Attachment: HBase metastore split cache.pdf

Attaching the doc

 implement file footer / splits cache in HBase metastore
 ---

 Key: HIVE-11500
 URL: https://issues.apache.org/jira/browse/HIVE-11500
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBase metastore split cache.pdf


 We need to cache file metadata (e.g. ORC file footers) for split generation 
 (which, on FSes that support fileId, will be valid permanently and only needs 
 to be removed lazily when ORC file is erased or compacted), and potentially 
 even some information about splits (e.g. grouping based on location that 
 would be good for some short time), in HBase metastore.
 -It should be queryable by table. Partition predicate pushdown should be 
 supported. If bucket pruning is added, that too.- Given that we cannot cache 
 file lists (we have to check FS for new/changed files anyway), and the 
 difficulty of passing of data about partitions/etc. to split generation 
 compared to paths, we will probably just filter by paths and fileIds. It 
 might be different for splits
 In later phases, it would be nice to save the (first category above) results 
 of expensive work done by jobs, e.g. data size after decompression/decoding 
 per column, etc. to avoid surprises when ORC encoding is very good, or very 
 bad. Perhaps it can even be lazily generated. Here's a pony: 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11500:

Attachment: (was: HBase metastore split cache.pdf)

 implement file footer / splits cache in HBase metastore
 ---

 Key: HIVE-11500
 URL: https://issues.apache.org/jira/browse/HIVE-11500
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBase metastore split cache.pdf


 We need to cache file metadata (e.g. ORC file footers) for split generation 
 (which, on FSes that support fileId, will be valid permanently and only needs 
 to be removed lazily when ORC file is erased or compacted), and potentially 
 even some information about splits (e.g. grouping based on location that 
 would be good for some short time), in HBase metastore.
 -It should be queryable by table. Partition predicate pushdown should be 
 supported. If bucket pruning is added, that too.- Given that we cannot cache 
 file lists (we have to check FS for new/changed files anyway), and the 
 difficulty of passing of data about partitions/etc. to split generation 
 compared to paths, we will probably just filter by paths and fileIds. It 
 might be different for splits
 In later phases, it would be nice to save the (first category above) results 
 of expensive work done by jobs, e.g. data size after decompression/decoding 
 per column, etc. to avoid surprises when ORC encoding is very good, or very 
 bad. Perhaps it can even be lazily generated. Here's a pony: 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11500) implement file footer / splits cache in HBase metastore

2015-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11500:

Attachment: HBase metastore split cache.pdf

 implement file footer / splits cache in HBase metastore
 ---

 Key: HIVE-11500
 URL: https://issues.apache.org/jira/browse/HIVE-11500
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBase metastore split cache.pdf


 We need to cache file metadata (e.g. ORC file footers) for split generation 
 (which, on FSes that support fileId, will be valid permanently and only needs 
 to be removed lazily when ORC file is erased or compacted), and potentially 
 even some information about splits (e.g. grouping based on location that 
 would be good for some short time), in HBase metastore.
 -It should be queryable by table. Partition predicate pushdown should be 
 supported. If bucket pruning is added, that too.- Given that we cannot cache 
 file lists (we have to check FS for new/changed files anyway), and the 
 difficulty of passing of data about partitions/etc. to split generation 
 compared to paths, we will probably just filter by paths and fileIds. It 
 might be different for splits
 In later phases, it would be nice to save the (first category above) results 
 of expensive work done by jobs, e.g. data size after decompression/decoding 
 per column, etc. to avoid surprises when ORC encoding is very good, or very 
 bad. Perhaps it can even be lazily generated. Here's a pony: 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9377) UDF in_file() in WHERE predicate causes NPE.

2015-08-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9377:
-
Fix Version/s: 1.0.2

Including fix to branch-1.0

 UDF in_file() in WHERE predicate causes NPE.
 

 Key: HIVE-9377
 URL: https://issues.apache.org/jira/browse/HIVE-9377
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-9377.1.patch


 Consider the following query:
 {code:sql}
 SELECT foo, bar from mythdb.foobar where in_file( bar, '/tmp/bar_list.txt' );
 {code}
 Using {{in_file()}} in a WHERE predicate causes the following NPE:
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getWritableConstantValue(ObjectInspectorUtils.java:1041)
   at 
 org.apache.hadoop.hive.ql.udf.generic.GenericUDFInFile.getRequiredFiles(GenericUDFInFile.java:93)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.isDeterministicUdf(ConstantPropagateProcFactory.java:303)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.foldExpr(ConstantPropagateProcFactory.java:226)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory.access$000(ConstantPropagateProcFactory.java:92)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory$ConstantPropagateFilterProc.process(ConstantPropagateProcFactory.java:623)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagate$ConstantPropagateWalker.walk(ConstantPropagate.java:147)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
   at 
 org.apache.hadoop.hive.ql.optimizer.ConstantPropagate.transform(ConstantPropagate.java:117)
   at 
 org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:177)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10032)
   at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:189)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:224)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1156)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:206)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:158)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:369)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:304)
   at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:701)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:674)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {noformat}
 I have a tentative fix I need advice on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11461) Transform flat AND/OR into IN struct clause

2015-08-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-11461.
---
Resolution: Fixed

Failures unrelated. Committed to master. Thank you [~jcamachorodriguez]!

 Transform flat AND/OR into IN struct clause
 ---

 Key: HIVE-11461
 URL: https://issues.apache.org/jira/browse/HIVE-11461
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11461.1.patch, HIVE-11461.2.patch, HIVE-11461.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-11461) Transform flat AND/OR into IN struct clause

2015-08-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reopened HIVE-11461:
---

Updated wrong jira. My bad.

 Transform flat AND/OR into IN struct clause
 ---

 Key: HIVE-11461
 URL: https://issues.apache.org/jira/browse/HIVE-11461
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11461.1.patch, HIVE-11461.2.patch, HIVE-11461.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11502) Map side aggregation is extremely slow

2015-08-10 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681015#comment-14681015
 ] 

Yongzhi Chen commented on HIVE-11502:
-

[~gopalv], thanks for the workaround. But I am afraid some users do not want to 
change their input format. And this HashMap may affect mapjoin too. We help a 
user workaround this map side aggregation issue by set hive.map.aggr = false; 
After that, the simple group test case has very good performance, but a more 
complicated join query with group by as subquery stuck on mapjoin. So we have 
to let the user turn off mapjoin by set hive.auto.convert.join=false;  The 
performance hit by this bug is really outstanding. Without workaround, none of 
the query can finish in several hours. So I think we have to fix it. 

 Map side aggregation is extremely slow
 --

 Key: HIVE-11502
 URL: https://issues.apache.org/jira/browse/HIVE-11502
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer, Physical Optimizer
Affects Versions: 1.2.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen

 For the query as following:
 {noformat}
 create table tbl2 as 
 select col1, max(col2) as col2 
 from tbl1 group by col1;
 {noformat}
 If the column for group by has many different values (for example 40) and 
 it is in type double, the map side aggregation is very slow. I ran the query 
 which took more than 3 hours , after 3 hours, I have to kill the query.
 The same query can finish in 7 seconds, if I turn off map side aggregation by:
 {noformat}
 set hive.map.aggr = false;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner

2015-08-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-11515:
-
Attachment: HIVE-11515.1.patch.txt

 Still some possible race condition in DynamicPartitionPruner
 

 Key: HIVE-11515
 URL: https://issues.apache.org/jira/browse/HIVE-11515
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-11515.1.patch.txt


 Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to 
 reproduce but it seemed related to the fact that prune() is called by 
 thread-pool. With some delay in queue, events from fast tasks are arrived 
 before prune() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner

2015-08-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-11515:
-
Description: Even after HIVE-9976, I could see race condition in DPP 
sometimes. Hard to reproduce but it seemed related to the fact that prune() is 
called by thread-pool. With some delay in queue, events from fast tasks are 
arrived before prune() is called.  (was: Even after HIVE-9976, I could see race 
condition in DPP sometimes. Hard to reproduce but it seemed related to the fact 
that init() is called by thread-pool. With some delay in queue, events from 
fast tasks are arrived before init() is called.)

 Still some possible race condition in DynamicPartitionPruner
 

 Key: HIVE-11515
 URL: https://issues.apache.org/jira/browse/HIVE-11515
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-11515.1.patch.txt


 Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to 
 reproduce but it seemed related to the fact that prune() is called by 
 thread-pool. With some delay in queue, events from fast tasks are arrived 
 before prune() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-10 Thread Aaron Tokhy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681038#comment-14681038
 ] 

Aaron Tokhy commented on HIVE-10631:


Reading more about hive.stats.reliable, it did not appear to be appropriate to 
use it in this case, and to instead it would be better to defer stats 
calculation for partitioned tables when partitions are being added to a table 
(MSCK/ALTER TABLE), and not on table creation (CREATE [EXTERNAL] TABLE)

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0, 1.0.0
Reporter: Dongwook Kwon
Priority: Minor

 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11515) Still some possible race condition in DynamicPartitionPruner

2015-08-10 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11515:
---
Component/s: Tez

 Still some possible race condition in DynamicPartitionPruner
 

 Key: HIVE-11515
 URL: https://issues.apache.org/jira/browse/HIVE-11515
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tez
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-11515.1.patch.txt


 Even after HIVE-9976, I could see race condition in DPP sometimes. Hard to 
 reproduce but it seemed related to the fact that prune() is called by 
 thread-pool. With some delay in queue, events from fast tasks are arrived 
 before prune() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11387) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix reduce_deduplicate optimization

2015-08-10 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681052#comment-14681052
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11387:
--

[~pxiong] I need to get the patch into  branch-1 as well. The patch does not 
apply cleanly with branch-1. Could you please upload one.

Thanks
Hari

 CBO: Calcite Operator To Hive Operator (Calcite Return Path) : fix 
 reduce_deduplicate optimization
 --

 Key: HIVE-11387
 URL: https://issues.apache.org/jira/browse/HIVE-11387
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Fix For: 2.0.0

 Attachments: HIVE-11387.01.patch, HIVE-11387.02.patch, 
 HIVE-11387.03.patch, HIVE-11387.04.patch, HIVE-11387.05.patch, 
 HIVE-11387.06.patch, HIVE-11387.07.patch


 The main problem is that, due to return path, now we may have 
 {{(RS1-GBY2)\-(RS3-GBY4)}} when map.aggr=false, i.e., no map aggr. However, 
 in the non-return path, it will be treated as {{(RS1)-(GBY2-RS3-GBY4)}}. The 
 main problem is that it does not take into account of the setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet

2015-08-10 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-11504:

Attachment: HIVE-11504.2.patch

Hi [~spena], please help me review this patch. Thank you!

 Predicate pushing down doesn't work for float type for Parquet
 --

 Key: HIVE-11504
 URL: https://issues.apache.org/jira/browse/HIVE-11504
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-11504.1.patch, HIVE-11504.2.patch, HIVE-11504.patch


 Predicate builder should use PrimitiveTypeName type in parquet side to 
 construct predicate leaf instead of the type provided by PredicateLeaf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10625) Handle Authorization for 'select expr' hive queries in SQL Standard Authorization

2015-08-10 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou resolved HIVE-10625.
--
Resolution: Duplicate

Same work is going on at HIVE-11498,so close this one.

 Handle Authorization for  'select expr' hive queries in  SQL Standard 
 Authorization
 -

 Key: HIVE-10625
 URL: https://issues.apache.org/jira/browse/HIVE-10625
 Project: Hive
  Issue Type: Bug
  Components: Authorization, SQLStandardAuthorization
Affects Versions: 1.1.0
Reporter: Nemon Lou

 Hive internally rewrites this 'select expression' query into 'select 
 expression from _dummy_database._dummy_table', where these dummy db and 
 table are temp entities for the current query.
 The SQL Standard Authorization  need to handle these special objects.
 Typing select reverse(123); in beeline,will get this error :
 {code}
 Error: Error while compiling statement: FAILED: HiveAuthzPluginException 
 Error getting object from metastore for Object [type=TABLE_OR_VIEW, 
 name=_dummy_database._dummy_table] (state=42000,code=4)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11480) CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as input to GenericUDAF

2015-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681098#comment-14681098
 ] 

Hive QA commented on HIVE-11480:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12749652/HIVE-11480.03.patch

{color:green}SUCCESS:{color} +1 9347 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4911/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4911/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4911/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12749652 - PreCommit-HIVE-TRUNK-Build

 CBO: Calcite Operator To Hive Operator (Calcite Return Path): char/varchar as 
 input to GenericUDAF 
 ---

 Key: HIVE-11480
 URL: https://issues.apache.org/jira/browse/HIVE-11480
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-11480.01.patch, HIVE-11480.02.patch, 
 HIVE-11480.03.patch


 Some of the UDAF can not deal with char/varchar correctly when return path is 
 on, for example udaf_number_format.q.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8326) Using DbTxnManager with concurrency off results in run time error

2015-08-10 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-8326:
-
Fix Version/s: 1.0.2

Including this fix to branch-1.0

 Using DbTxnManager with concurrency off results in run time error
 -

 Key: HIVE-8326
 URL: https://issues.apache.org/jira/browse/HIVE-8326
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 1.1.0, 1.0.2

 Attachments: HIVE-8326.patch


 Setting
 {code}
 hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
 hive.support.concurrency=false
 {code}
 results in queries failing at runtime with an NPE in DbTxnManager.heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11506) Casting varchar/char type to string cannot be vectorized

2015-08-10 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-11506:
-
Attachment: HIVE-11506.2.patch.txt

Updated golden files

 Casting varchar/char type to string cannot be vectorized
 

 Key: HIVE-11506
 URL: https://issues.apache.org/jira/browse/HIVE-11506
 Project: Hive
  Issue Type: Improvement
  Components: Vectorization
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-11506.1.patch.txt, HIVE-11506.2.patch.txt


 It's not defined in vectorization context.
 {code}
 explain 
 select cast(cast(cstring1 as varchar(10)) as string) x from alltypesorc order 
 by x;
 {code}
 Mapper is not vectorized by exception,
 {noformat}
 015-08-10 17:02:08,003 INFO  [main]: physical.Vectorizer 
 (Vectorizer.java:validateExprNodeDesc(1299)) - Failed to vectorize
 org.apache.hadoop.hive.ql.metadata.HiveException: Unhandled cast input type: 
 varchar(10)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getCastToString(VectorizationContext.java:1543)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUDFBridgeVectorExpression(VectorizationContext.java:1379)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:1177)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:440)
 at 
 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1293)
 at 
 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:1284)
 at 
 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateSelectOperator(Vectorizer.java:1116)
 at 
 org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateMapWorkOperator(Vectorizer.java:906)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: HIVE-10631.patch

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: HIVE-10631.patch


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: (was: HIVE-10631.patch)

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: HIVE-10631.patch.1


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10631) create_table_core method has invalid update for Fast Stats

2015-08-10 Thread Aaron Tokhy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Tokhy updated HIVE-10631:
---
Attachment: HIVE-10631.patch.1

 create_table_core method has invalid update for Fast Stats
 --

 Key: HIVE-10631
 URL: https://issues.apache.org/jira/browse/HIVE-10631
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0
Reporter: Dongwook Kwon
Priority: Minor
 Attachments: HIVE-10631.patch.1


 HiveMetaStore.create_table_core method calls 
 MetaStoreUtils.updateUnpartitionedTableStatsFast when hive.stats.autogather 
 is on, however for partitioned table, this updateUnpartitionedTableStatsFast 
 call scanning warehouse dir and doesn't seem to use it. 
 Fast Stats was implemented by HIVE-3959
 https://github.com/apache/hive/blob/branch-1.0/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1363
 From create_table_core method
 {code}
 if (HiveConf.getBoolVar(hiveConf, 
 HiveConf.ConfVars.HIVESTATSAUTOGATHER) 
 !MetaStoreUtils.isView(tbl)) {
   if (tbl.getPartitionKeysSize() == 0)  { // Unpartitioned table
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 madeDir);
   } else { // Partitioned table with no partitions.
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, 
 true);
   }
 }
 {code}
 Particularly Line 1363: // Partitioned table with no partitions.
 {code}
 MetaStoreUtils.updateUnpartitionedTableStatsFast(db, tbl, wh, true);
 {code}
 This call ends up calling Warehouse.getFileStatusesForUnpartitionedTable and 
 do nothing in MetaStoreUtils.updateUnpartitionedTableStatsFast method due to 
 newDir flag is always true
 Impact of this bug is minor with HDFS warehouse 
 location(hive.metastore.warehouse.dir), it could be big with S3 warehouse 
 location especially for large existing partitions.
 Also the impact is heighten with HIVE-6727 when warehouse location is S3, 
 basically it could scan wrong S3 directory recursively and do nothing with 
 it. I will add more detail of cases in comments



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >