[jira] [Updated] (HIVE-10159) HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc

2015-03-31 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10159:
---
Attachment: HIVE-10159.1.patch

patch #1

 HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by 
 JoinDesc.keyTableDesc
 -

 Key: HIVE-10159
 URL: https://issues.apache.org/jira/browse/HIVE-10159
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10159.1.patch


 MapJoinDesc and HashTableSinkDesc are derived from JoinDesc
 HashTableSinkDesc and MapJoinDesc have keyTblDesc field.
 JoinDesc has keyTableDesc field.
 I think HashTableSinkDesc and MapJoinDesc can use superclass (JoinDesc) 
 keyTableDesc field instead of defining their own keyTblDesc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]

2015-03-31 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9969:
-
Attachment: HIVE-9969.1-spark.patch

 Avoid Utilities.getMapRedWork for spark [Spark Branch]
 --

 Key: HIVE-9969
 URL: https://issues.apache.org/jira/browse/HIVE-9969
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Minor
 Attachments: HIVE-9969.1-spark.patch


 The method shouldn't be used for spark mode. Specifically, map work and 
 reduce work have different plan paths in spark. Calling this method will 
 leave lots of errors in executor's log:
 {noformat}
 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: 
 hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: 
 /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8818) Create unit test where we insert into an encrypted table and then read from it with hcatalog mapreduce

2015-03-31 Thread Dong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-8818:

Attachment: HIVE-8818.patch

 Create unit test where we insert into an encrypted table and then read from 
 it with hcatalog mapreduce
 --

 Key: HIVE-8818
 URL: https://issues.apache.org/jira/browse/HIVE-8818
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Dong Chen
 Attachments: HIVE-8818.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10160) Give a warning when grouping or ordering by a constant column

2015-03-31 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388087#comment-14388087
 ] 

Lefty Leverenz commented on HIVE-10160:
---

See the thread ORDER BY clause in Hive on u...@hive.apache.org:

* 
[http://mail-archives.apache.org/mod_mbox/hive-user/201503.mbox/%3c05d701d069b4$c71f6e60$555e4b20$@co.uk%3e]

 Give a warning when grouping or ordering by a constant column
 -

 Key: HIVE-10160
 URL: https://issues.apache.org/jira/browse/HIVE-10160
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Lefty Leverenz
Priority: Minor

 To avoid confusion, a warning should be issued when users specify column 
 positions instead of names in a GROUP BY or ORDER BY clause (unless 
 hive.groupby.orderby.position.alias is set to true in Hive 0.11.0 or later).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10053) Override new init API fom ReadSupport instead of the deprecated one

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388161#comment-14388161
 ] 

Hive QA commented on HIVE-10053:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708309/HIVE-10053.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8692 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3214/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3214/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3214/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708309 - PreCommit-HIVE-TRUNK-Build

 Override new init API fom ReadSupport instead of the deprecated one
 ---

 Key: HIVE-10053
 URL: https://issues.apache.org/jira/browse/HIVE-10053
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-10053.1.patch, HIVE-10053.2.patch, HIVE-10053.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9693) Introduce a stats cache for aggregate stats in HBase metastore [hbase-metastore branch]

2015-03-31 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-9693:
---
Summary: Introduce a stats cache for aggregate stats in HBase metastore  
[hbase-metastore branch]  (was: Introduce a stats cache for HBase metastore  
[hbase-metastore branch])

 Introduce a stats cache for aggregate stats in HBase metastore  
 [hbase-metastore branch]
 

 Key: HIVE-9693
 URL: https://issues.apache.org/jira/browse/HIVE-9693
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-9693.1.patch, HIVE-9693.2.patch, HIVE-9693.3.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388224#comment-14388224
 ] 

Hive QA commented on HIVE-9969:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708350/HIVE-9969.1-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8710 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/816/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/816/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-816/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708350 - PreCommit-HIVE-SPARK-Build

 Avoid Utilities.getMapRedWork for spark [Spark Branch]
 --

 Key: HIVE-9969
 URL: https://issues.apache.org/jira/browse/HIVE-9969
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Minor
 Attachments: HIVE-9969.1-spark.patch


 The method shouldn't be used for spark mode. Specifically, map work and 
 reduce work have different plan paths in spark. Calling this method will 
 leave lots of errors in executor's log:
 {noformat}
 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: 
 hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: 
 /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10163:
---
Attachment: mergejoin-parallel-lock.png
mergejoin-parallel-bt.png

 CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
 

 Key: HIVE-10163
 URL: https://issues.apache.org/jira/browse/HIVE-10163
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Gopal V
  Labels: JOIN, Performance
 Attachments: mergejoin-comparekeys.png, mergejoin-parallel-bt.png, 
 mergejoin-parallel-lock.png


 The CommonMergeJoinOperator wastes CPU looking up the correct comparator for 
 each WritableComparable in each row.
 {code}
 @SuppressWarnings(rawtypes)
   private int compareKeys(ListObject k1, ListObject k2) {
 int ret = 0;
    
   ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2);
   if (ret != 0) {
 return ret;
   }
 }
 {code}
 !mergejoin-comparekeys.png!
 The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where 
 it tries to use reflection to set the Comparator config for each row being 
 compared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10163:
---
Description: 
The CommonMergeJoinOperator wastes CPU looking up the correct comparator for 
each WritableComparable in each row.

{code}
@SuppressWarnings(rawtypes)
  private int compareKeys(ListObject k1, ListObject k2) {
int ret = 0;
   
  ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2);
  if (ret != 0) {
return ret;
  }
}
{code}

!mergejoin-comparekeys.png!

The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where 
it tries to use reflection to set the Comparator config for each row being 
compared.

  was:
The CommonMergeJoinOperator wastes CPU looking up the correct comparator for 
each WritableComparable in each row.

{code}
@SuppressWarnings(rawtypes)
  private int compareKeys(ListObject k1, ListObject k2) {
int ret = 0;
   
  ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2);
  if (ret != 0) {
return ret;
  }
}
{code}

The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where 
it tries to use reflection to set the Comparator config for each row being 
compared.


 CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
 

 Key: HIVE-10163
 URL: https://issues.apache.org/jira/browse/HIVE-10163
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Gopal V
  Labels: JOIN, Performance
 Attachments: mergejoin-comparekeys.png


 The CommonMergeJoinOperator wastes CPU looking up the correct comparator for 
 each WritableComparable in each row.
 {code}
 @SuppressWarnings(rawtypes)
   private int compareKeys(ListObject k1, ListObject k2) {
 int ret = 0;
    
   ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2);
   if (ret != 0) {
 return ret;
   }
 }
 {code}
 !mergejoin-comparekeys.png!
 The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where 
 it tries to use reflection to set the Comparator config for each row being 
 compared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10163:
---
Attachment: mergejoin-comparekeys.png

 CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
 

 Key: HIVE-10163
 URL: https://issues.apache.org/jira/browse/HIVE-10163
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Gopal V
  Labels: JOIN, Performance
 Attachments: mergejoin-comparekeys.png


 The CommonMergeJoinOperator wastes CPU looking up the correct comparator for 
 each WritableComparable in each row.
 {code}
 @SuppressWarnings(rawtypes)
   private int compareKeys(ListObject k1, ListObject k2) {
 int ret = 0;
    
   ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2);
   if (ret != 0) {
 return ret;
   }
 }
 {code}
 The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where 
 it tries to use reflection to set the Comparator config for each row being 
 compared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop

2015-03-31 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388254#comment-14388254
 ] 

Gopal V commented on HIVE-10163:


Nope, it still hits HADOOP-11771 in the processKey().

 CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
 

 Key: HIVE-10163
 URL: https://issues.apache.org/jira/browse/HIVE-10163
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Gopal V
  Labels: JOIN, Performance
 Attachments: mergejoin-comparekeys.png, mergejoin-parallel-bt.png, 
 mergejoin-parallel-lock.png


 The CommonMergeJoinOperator wastes CPU looking up the correct comparator for 
 each WritableComparable in each row.
 {code}
 @SuppressWarnings(rawtypes)
   private int compareKeys(ListObject k1, ListObject k2) {
 int ret = 0;
    
   ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2);
   if (ret != 0) {
 return ret;
   }
 }
 {code}
 !mergejoin-parallel-lock.png!
 !mergejoin-comparekeys.png!
 The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where 
 it tries to use reflection to set the Comparator config for each row being 
 compared.
 !mergejoin-parallel-bt.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9845) HCatSplit repeats information making input split data size huge

2015-03-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-9845:
---
Attachment: (was: HIVE-9845.2.patch)

 HCatSplit repeats information making input split data size huge
 ---

 Key: HIVE-9845
 URL: https://issues.apache.org/jira/browse/HIVE-9845
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Rohini Palaniswamy
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9845.1.patch


 Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which 
 has even triple the number of splits(100K+ splits and tasks) does not hit 
 that issue.
 {code}
 HCatBaseInputFormat.java:
  //Call getSplit on the InputFormat, create an
   //HCatSplit for each underlying split
   //NumSplits is 0 for our purposes
   org.apache.hadoop.mapred.InputSplit[] baseSplits = 
 inputFormat.getSplits(jobConf, 0);
   for(org.apache.hadoop.mapred.InputSplit split : baseSplits) {
 splits.add(new HCatSplit(
 partitionInfo,
 split,allCols));
   }
 {code}
 Each hcatSplit duplicates partition schema and table schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9845) HCatSplit repeats information making input split data size huge

2015-03-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-9845:
---
Attachment: HIVE-9845.3.patch

Another take on the first patch. Except, with more logging, and a correction to 
{{TestHCatOutputFormat}}.

 HCatSplit repeats information making input split data size huge
 ---

 Key: HIVE-9845
 URL: https://issues.apache.org/jira/browse/HIVE-9845
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Rohini Palaniswamy
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch


 Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which 
 has even triple the number of splits(100K+ splits and tasks) does not hit 
 that issue.
 {code}
 HCatBaseInputFormat.java:
  //Call getSplit on the InputFormat, create an
   //HCatSplit for each underlying split
   //NumSplits is 0 for our purposes
   org.apache.hadoop.mapred.InputSplit[] baseSplits = 
 inputFormat.getSplits(jobConf, 0);
   for(org.apache.hadoop.mapred.InputSplit split : baseSplits) {
 splits.add(new HCatSplit(
 partitionInfo,
 split,allCols));
   }
 {code}
 Each hcatSplit duplicates partition schema and table schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9693) Introduce a stats cache for HBase metastore [hbase-metastore branch]

2015-03-31 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-9693:
---
Attachment: HIVE-9693.3.patch

 Introduce a stats cache for HBase metastore  [hbase-metastore branch]
 -

 Key: HIVE-9693
 URL: https://issues.apache.org/jira/browse/HIVE-9693
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Attachments: HIVE-9693.1.patch, HIVE-9693.2.patch, HIVE-9693.3.patch


 NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10162) LLAP: Avoid deserializing the plan 1 times in a single thread

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10162:
---
Attachment: deserialize-plan-2.png
deserialize-plan-1.png

 LLAP: Avoid deserializing the plan  1 times in a single thread
 ---

 Key: HIVE-10162
 URL: https://issues.apache.org/jira/browse/HIVE-10162
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Fix For: llap

 Attachments: deserialize-plan-1.png, deserialize-plan-2.png


 Kryo shows up in the critical hot-path for LLAP when using a plan with a very 
 large filter condition, due to the fact that the plan is deserialized more 
 than once for each task.
 !deserialize-plan-1.png!
 !deserialize-plan-2.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop

2015-03-31 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388188#comment-14388188
 ] 

Gopal V commented on HIVE-10163:


{{WritableComparator::get()}} -- {{ReflectionUtils.setConf()}} is pointless, 
but it misses the static synchronized block inside HADOOP-11771 because the 
default argument is NULL.

There is no difference between each iteration of the same row-keys, since the 
schema does not vary between JOIN keys in the same operator.

 CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
 

 Key: HIVE-10163
 URL: https://issues.apache.org/jira/browse/HIVE-10163
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Gopal V
  Labels: JOIN, Performance

 The CommonMergeJoinOperator wastes CPU looking up the correct comparator for 
 each WritableComparable in each row.
 {code}
 @SuppressWarnings(rawtypes)
   private int compareKeys(ListObject k1, ListObject k2) {
 int ret = 0;
    
   ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2);
   if (ret != 0) {
 return ret;
   }
 }
 {code}
 The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where 
 it tries to use reflection to set the Comparator config for each row being 
 compared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10164) LLAP: ORC BIGINT SARGs regressed after Parquet PPD fixes (HIVE-8122)

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10164:
---
Attachment: orc-sarg-tostring.png

 LLAP: ORC BIGINT SARGs regressed after Parquet PPD fixes (HIVE-8122)
 

 Key: HIVE-10164
 URL: https://issues.apache.org/jira/browse/HIVE-10164
 Project: Hive
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Prasanth Jayachandran
 Attachments: orc-sarg-tostring.png


 HIVE-8122 seems to have introduced a toString() to the ORC PPD codepath for 
 BIGINT.
 https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java#L162
 {code}
private ListObject getOrcLiteralList() {
   // no need to cast
 ...
  ListObject result = new ArrayListObject();
   for (Object o : literalList) {
 result.add(Long.valueOf(o.toString()));
   }
   return result;
 }
 {code}
 !orc-sarg-tostring.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10001) SMB join in reduce side

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388382#comment-14388382
 ] 

Hive QA commented on HIVE-10001:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708327/HIVE-10001.7.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8692 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3217/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3217/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3217/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708327 - PreCommit-HIVE-TRUNK-Build

 SMB join in reduce side
 ---

 Key: HIVE-10001
 URL: https://issues.apache.org/jira/browse/HIVE-10001
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-10001.1.patch, HIVE-10001.2.patch, 
 HIVE-10001.3.patch, HIVE-10001.4.patch, HIVE-10001.5.patch, 
 HIVE-10001.6.patch, HIVE-10001.7.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-3378) UDF to obtain the numeric day of an year from date or timestamp in HIVE.

2015-03-31 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov reassigned HIVE-3378:
-

Assignee: Alexander Pivovarov

   UDF to obtain the numeric day of an year from date or timestamp in  
 HIVE. 
 

 Key: HIVE-3378
 URL: https://issues.apache.org/jira/browse/HIVE-3378
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.8.1, 0.9.0
Reporter: Deepti Antony
Assignee: Alexander Pivovarov
 Attachments: HIVE-3378.1.patch.txt


   Hive current releases lacks a function which returns the numeric day of an 
 year if a date or timestamp is given .The function DAYOFYEAR(date) would 
 return the numeric day  from a date / timestamp or  which would be useful 
 while using HiveQL.DAYOFYEAR can be used to compare  data with respect to 
 number of days till the given date.It can be used in different domains.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-10044) Allow interval params for year/month/day/hour/minute/second functions

2015-03-31 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389860#comment-14389860
 ] 

Thejas M Nair edited comment on HIVE-10044 at 4/1/15 1:52 AM:
--

[~jdere] Can you also please update the function descriptions (@ Description 
annotation) for these?

Describe function tests would also need to be updated for them after that 
change(assuming they exist).



was (Author: thejas):
[~jdere] Can you also please update the function descriptions (@ Description 
annotation) for these?


 Allow interval params for year/month/day/hour/minute/second functions
 -

 Key: HIVE-10044
 URL: https://issues.apache.org/jira/browse/HIVE-10044
 Project: Hive
  Issue Type: Sub-task
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10044.1.patch


 Update the year/month/day/hour/minute/second functions to retrieve the 
 various fields of the year-month and day-time interval types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10172) Fix performance regression caused by HIVE-8122 for ORC

2015-03-31 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389900#comment-14389900
 ] 

Ferdinand Xu commented on HIVE-10172:
-

Thank you for your update. LGTM for the new patch.

 Fix performance regression caused by HIVE-8122 for ORC
 --

 Key: HIVE-10172
 URL: https://issues.apache.org/jira/browse/HIVE-10172
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10172.1.patch, HIVE-10172.2.patch


 See HIVE-10164 for description. We should fix this in trunk and move it to 
 branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10175) PartitionPruning lacks a fast-path exit for large IN() queries

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10175:
---
Component/s: Query Planning

 PartitionPruning lacks a fast-path exit for large IN() queries
 --

 Key: HIVE-10175
 URL: https://issues.apache.org/jira/browse/HIVE-10175
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Tez
Affects Versions: 1.2.0
Reporter: Gopal V
Priority: Minor

 TezCompiler::runDynamicPartitionPruning() calls the graph walker even if all 
 tables provided to the optimizer are unpartitioned temporary tables.
 This makes it extremely slow as it will walk  inspect a large/complex 
 FilterOperator later in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10175) PartitionPruning lacks a fast-path exit for large IN() queries

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10175:
---
Summary: PartitionPruning lacks a fast-path exit for large IN() queries  
(was: Tez DynamicPartitionPruning lacks a fast-path exit for large IN() queries)

 PartitionPruning lacks a fast-path exit for large IN() queries
 --

 Key: HIVE-10175
 URL: https://issues.apache.org/jira/browse/HIVE-10175
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Tez
Affects Versions: 1.2.0
Reporter: Gopal V
Priority: Minor

 TezCompiler::runDynamicPartitionPruning() calls the graph walker even if all 
 tables provided to the optimizer are unpartitioned temporary tables.
 This makes it extremely slow as it will walk  inspect a large/complex 
 FilterOperator later in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository

2015-03-31 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1439#comment-1439
 ] 

Lefty Leverenz commented on HIVE-9664:
--

Doc note:  The ADD FILE | JAR | ARCHIVE commands are documented in several 
places, so this information needs to be added to all of them or perhaps just to 
Hive Resources in the CLI doc with links from the others.

* [Commands | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Commands]
* [HiveServer2 Clients -- Beeline Hive Commands | 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineHiveCommands]
* [CLI -- Hive Interactive Shell Commands | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands]
* [CLI -- Hive Resources | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources]

 Hive add jar command should be able to download and add jars from a 
 repository
 

 Key: HIVE-9664
 URL: https://issues.apache.org/jira/browse/HIVE-9664
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Anant Nag
Assignee: Anant Nag
  Labels: TODOC1.2, hive, patch
 Fix For: 1.2.0

 Attachments: HIVE-9664.4.patch, HIVE-9664.5.patch, HIVE-9664.patch, 
 HIVE-9664.patch, HIVE-9664.patch


 Currently Hive's add jar command takes a local path to the dependency jar. 
 This clutters the local file-system as users may forget to remove this jar 
 later
 It would be nice if Hive supported a Gradle like notation to download the jar 
 from a repository.
 Example:  add jar org:module:version
 
 It should also be backward compatible and should take jar from the local 
 file-system as well. 
 RB:  https://reviews.apache.org/r/31628/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10174) LLAP: ORC MemoryManager is singleton synchronized

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10174:
---
Description: 
ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded 
performance.

!orc-memorymanager-1.png!
!orc-memorymanager-2.png!

  was:
ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded 
performance.

!orc-memory-manager-1.png!
!orc-memory-manager-2.png!


 LLAP: ORC MemoryManager is singleton synchronized
 -

 Key: HIVE-10174
 URL: https://issues.apache.org/jira/browse/HIVE-10174
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: llap
Reporter: Gopal V
 Attachments: orc-memorymanager-1.png, orc-memorymanager-2.png


 ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded 
 performance.
 !orc-memorymanager-1.png!
 !orc-memorymanager-2.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10174) LLAP: ORC MemoryManager is singleton synchronized

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10174:
---
Attachment: orc-memorymanager-2.png
orc-memorymanager-1.png

 LLAP: ORC MemoryManager is singleton synchronized
 -

 Key: HIVE-10174
 URL: https://issues.apache.org/jira/browse/HIVE-10174
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: llap
Reporter: Gopal V
 Attachments: orc-memorymanager-1.png, orc-memorymanager-2.png


 ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded 
 performance.
 !orc-memory-manager-1.png!
 !orc-memory-manager-2.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10174) LLAP: ORC MemoryManager is singleton synchronized

2015-03-31 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389958#comment-14389958
 ] 

Gopal V commented on HIVE-10174:


Performance difference is somewhere along the lines of 34s with the 
MemoryManager + addRow synchronized blocks vs 9s without the MemoryManager and 
the addRow synchronized(this).

To be looked at when we're writing ORC out of LLAP.

 LLAP: ORC MemoryManager is singleton synchronized
 -

 Key: HIVE-10174
 URL: https://issues.apache.org/jira/browse/HIVE-10174
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: llap
Reporter: Gopal V
 Attachments: orc-memorymanager-1.png, orc-memorymanager-2.png


 ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded 
 performance.
 !orc-memorymanager-1.png!
 !orc-memorymanager-2.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository

2015-03-31 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9664:
-
Labels: TODOC1.2 hive patch  (was: hive patch)

 Hive add jar command should be able to download and add jars from a 
 repository
 

 Key: HIVE-9664
 URL: https://issues.apache.org/jira/browse/HIVE-9664
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Anant Nag
Assignee: Anant Nag
  Labels: TODOC1.2, hive, patch
 Fix For: 1.2.0

 Attachments: HIVE-9664.4.patch, HIVE-9664.5.patch, HIVE-9664.patch, 
 HIVE-9664.patch, HIVE-9664.patch


 Currently Hive's add jar command takes a local path to the dependency jar. 
 This clutters the local file-system as users may forget to remove this jar 
 later
 It would be nice if Hive supported a Gradle like notation to download the jar 
 from a repository.
 Example:  add jar org:module:version
 
 It should also be backward compatible and should take jar from the local 
 file-system as well. 
 RB:  https://reviews.apache.org/r/31628/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10164) LLAP: ORC BIGINT SARGs regressed after Parquet PPD fixes (HIVE-8122)

2015-03-31 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-10164.
--
Resolution: Invalid

Will be fixed in HIVE-10172

 LLAP: ORC BIGINT SARGs regressed after Parquet PPD fixes (HIVE-8122)
 

 Key: HIVE-10164
 URL: https://issues.apache.org/jira/browse/HIVE-10164
 Project: Hive
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Prasanth Jayachandran
 Attachments: orc-sarg-tostring.png


 HIVE-8122 seems to have introduced a toString() to the ORC PPD codepath for 
 BIGINT.
 https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java#L162
 {code}
private ListObject getOrcLiteralList() {
   // no need to cast
 ...
  ListObject result = new ArrayListObject();
   for (Object o : literalList) {
 result.add(Long.valueOf(o.toString()));
   }
   return result;
 }
 {code}
 !orc-sarg-tostring.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10175) PartitionPruning lacks a fast-path exit for large IN() queries

2015-03-31 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389994#comment-14389994
 ] 

Gopal V commented on HIVE-10175:


{code}
METHOD DURATION(ms) 
parse  462
semanticAnalyze  9,312
TezBuildDag569
TezSubmitToRunningDag5
TotalPrepTime   11,343
{code}

Save 2 seconds by doing

{code}
set hive.tez.dynamic.partition.pruning=false;

METHOD DURATION(ms) 
parse  449
semanticAnalyze  7,254
TezBuildDag527
TezSubmitToRunningDag   16
TotalPrepTime9,190
{code}

save 9 seconds off default planning with

{code}
set hive.optimize.ppd=false;
set hive.tez.dynamic.partition.pruning=false;

METHOD DURATION(ms) 
parse  446
semanticAnalyze  2,089
TezBuildDag578
TezSubmitToRunningDag4
TotalPrepTime4,249
{code}


 PartitionPruning lacks a fast-path exit for large IN() queries
 --

 Key: HIVE-10175
 URL: https://issues.apache.org/jira/browse/HIVE-10175
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Tez
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gunther Hagleitner
Priority: Minor

 TezCompiler::runDynamicPartitionPruning()  ppr.PartitionPruner() calls the 
 graph walker even if all tables provided to the optimizer are unpartitioned 
 (or temporary) tables.
 This makes it extremely slow as it will walk  inspect a large/complex 
 FilterOperator later in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10103) LLAP: Cancelling tasks fails to stop cache filling threads

2015-03-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389832#comment-14389832
 ] 

Sergey Shelukhin commented on HIVE-10103:
-

Main problem is that tez doesn't close the reader when operators fail... trying 
to see now how to change that. Added some workarounds for when nextCvb is 
interrupted for now


 LLAP: Cancelling tasks fails to stop cache filling threads
 --

 Key: HIVE-10103
 URL: https://issues.apache.org/jira/browse/HIVE-10103
 Project: Hive
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Sergey Shelukhin

 Running a bad query (~1Tb scan on a 1Gb cache) and killing the tasks via the 
 container launcher fails to free up the cache filler threads.
 The cache filler threads with no consumers get stuck into a loop 
 {code}
 2015-03-26 14:02:47,335 
 [pool-2-thread-2(container_1_1659_01_74_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map
  1_73_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot 
 evict blocks for 262144 calls; cache full?
 2015-03-26 14:02:48,018 
 [pool-2-thread-7(container_1_1659_01_76_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map
  1_75_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot 
 evict blocks for 262144 calls; cache full?
 2015-03-26 14:02:51,658 
 [pool-2-thread-1(container_1_1659_01_73_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map
  1_72_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot 
 evict blocks for 262144 calls; cache full?
 {code}
 Needs to kill a daemon to get back to normal operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10148) update of bucking column should not be allowed

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389925#comment-14389925
 ] 

Hive QA commented on HIVE-10148:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708493/HIVE-10148.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 8692 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_all_types
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_tmp_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_update_noupdatepriv
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation.testUpdateSomeColumnsUsed
org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation.testUpdateSomeColumnsUsedExprInSet
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.dynamicPartitioningDelete
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3228/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3228/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3228/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708493 - PreCommit-HIVE-TRUNK-Build

 update of bucking column should not be allowed
 --

 Key: HIVE-10148
 URL: https://issues.apache.org/jira/browse/HIVE-10148
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.1.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10148.patch


 update tbl set a = 5;
 should raise an error if 'a' is a bucketing column.
 Such operation is not supported but currently not checked for.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]

2015-03-31 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-10134:

Attachment: HIVE-10134.2-spark.patch

Updated golden files for MR. Test failure on nonmr_fetch is strange. I couldn't 
reproduce the error on my local machine.

 Fix test failures after HIVE-10130 [Spark Branch]
 -

 Key: HIVE-10134
 URL: https://issues.apache.org/jira/browse/HIVE-10134
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-10134.1-spark.patch, HIVE-10134.2-spark.patch


 Complete test run: 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink
 *Failed tests:*
 {noformat}
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq
 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10152) ErrorMsg.formatToErrorMsgMap has bad regex

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389981#comment-14389981
 ] 

Hive QA commented on HIVE-10152:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708501/HIVE-10152.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8671 tests executed
*Failed tests:*
{noformat}
TestHiveAuthorizationTaskFactory - did not produce a TEST-*.xml file
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3229/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3229/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3229/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708501 - PreCommit-HIVE-TRUNK-Build

 ErrorMsg.formatToErrorMsgMap has bad regex
 --

 Key: HIVE-10152
 URL: https://issues.apache.org/jira/browse/HIVE-10152
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 1.1.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10152.patch


 {noformat}
 String pattern = errorMsg.mesg.replaceAll(\\{.*\\}, .*);
 {noformat}
 should be
 {noformat}
 String pattern = errorMsg.mesg.replaceAll(\\{[0-9]+\\}, .*);
 {noformat}
 current regex can match the whole msg (with more than 1 param)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10175) PartitionPruning lacks a fast-path exit for large IN() queries

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10175:
---
Assignee: Gunther Hagleitner

 PartitionPruning lacks a fast-path exit for large IN() queries
 --

 Key: HIVE-10175
 URL: https://issues.apache.org/jira/browse/HIVE-10175
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer, Tez
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gunther Hagleitner
Priority: Minor

 TezCompiler::runDynamicPartitionPruning()  ppr.PartitionPruner() calls the 
 graph walker even if all tables provided to the optimizer are unpartitioned 
 (or temporary) tables.
 This makes it extremely slow as it will walk  inspect a large/complex 
 FilterOperator later in the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]

2015-03-31 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389995#comment-14389995
 ] 

Rui Li commented on HIVE-9969:
--

Committed to spark. Thanks Xuefu.

 Avoid Utilities.getMapRedWork for spark [Spark Branch]
 --

 Key: HIVE-9969
 URL: https://issues.apache.org/jira/browse/HIVE-9969
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Fix For: spark-branch

 Attachments: HIVE-9969.1-spark.patch


 The method shouldn't be used for spark mode. Specifically, map work and 
 reduce work have different plan paths in spark. Calling this method will 
 leave lots of errors in executor's log:
 {noformat}
 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: 
 hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: 
 /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]

2015-03-31 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-9969:
-
Release Note:   (was: Committed to spark. Thanks Xuefu.)

 Avoid Utilities.getMapRedWork for spark [Spark Branch]
 --

 Key: HIVE-9969
 URL: https://issues.apache.org/jira/browse/HIVE-9969
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Minor
 Fix For: spark-branch

 Attachments: HIVE-9969.1-spark.patch


 The method shouldn't be used for spark mode. Specifically, map work and 
 reduce work have different plan paths in spark. Calling this method will 
 leave lots of errors in executor's log:
 {noformat}
 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: 
 hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: 
 /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10096) Investigate the random failure of TestCliDriver.testCliDriver_udaf_percentile_approx_23

2015-03-31 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov resolved HIVE-10096.

Resolution: Duplicate

dup of HIVE-10059

 Investigate the random failure of 
 TestCliDriver.testCliDriver_udaf_percentile_approx_23
 ---

 Key: HIVE-10096
 URL: https://issues.apache.org/jira/browse/HIVE-10096
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Aihua Xu
Assignee: Aihua Xu
Priority: Minor

 The unit test sometimes seems to fail with the following problem:
 Running: diff -a 
 /home/hiveptest/54.158.232.92-hiveptest-2/apache-svn-trunk-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/udaf_percentile_approx_23.q.out
  
 /home/hiveptest/54.158.232.92-hiveptest-2/apache-svn-trunk-source/itests/qtest/../../ql/src/test/results/clientpositive/udaf_percentile_approx_23.q.out
 628c628
  256.0
 ---
  255.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10081) LLAP: Make the low-level IO threadpool configurable

2015-03-31 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-10081.
--
Resolution: Fixed

Committed to llap branch

 LLAP: Make the low-level IO threadpool configurable
 ---

 Key: HIVE-10081
 URL: https://issues.apache.org/jira/browse/HIVE-10081
 Project: Hive
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10081.1.patch


 The LLAP low level reader thread-pool is hard-limited to 10-threads, which is 
 not sufficient to max out the network bandwidth on a 10GigE network.
 These threads are often seen in IOWAIT, since they are reading remote data.
 A dumb fix for my 12-core instance was to use a higher thread-pool count for 
 the IO read-ahead.
 {code}
 diff --git 
 a/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
  
 b/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
 index 3f9ddfb..b7cd177 100644
 --- 
 a/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
 +++ 
 b/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
 @@ -105,7 +105,7 @@ private LlapIoImpl(Configuration conf) throws IOException 
 {
cachePolicy.setEvictionListener(metadataCache);
  }
  // Arbitrary thread pool. Listening is used for unhandled errors for now 
 (TODO: remove?)
 -executor = 
 MoreExecutors.listeningDecorator(Executors.newFixedThreadPool(10));
 +executor = 
 MoreExecutors.listeningDecorator(Executors.newFixedThreadPool(24));
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10172) Fix performance regression caused by HIVE-8122 for ORC

2015-03-31 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10172:
-
Attachment: HIVE-10172.2.patch

Removed the explicit boxing. [~Ferd] Its a performance regression. There is no 
issue with functionality.

 Fix performance regression caused by HIVE-8122 for ORC
 --

 Key: HIVE-10172
 URL: https://issues.apache.org/jira/browse/HIVE-10172
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10172.1.patch, HIVE-10172.2.patch


 See HIVE-10164 for description. We should fix this in trunk and move it to 
 branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10168) make groupby3_map.q more stable

2015-03-31 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10168:
---
Issue Type: Improvement  (was: Bug)

 make groupby3_map.q more stable
 ---

 Key: HIVE-10168
 URL: https://issues.apache.org/jira/browse/HIVE-10168
 Project: Hive
  Issue Type: Improvement
  Components: Tests
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10168.1.patch


 The test run aggregation query which produces several DOUBLE numbers.
 Assertion framework compares output containing DOUBLE numbers without any 
 delta.
 As a result test is not stable
 e.g. build 3219 failed with the following test result
 {code}
 groupby3_map.q.out
 139c139
  130091.0260.182 256.10355987055016  98.00.0 
 142.92680950752379  143.06995106518903  20428.0728759   
 20469.010897795582
 ---
  130091.0260.182 256.10355987055016  98.00.0 
  142.9268095075238   143.06995106518906  20428.072876
  20469.01089779559
 {code}
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3219/testReport/junit/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_groupby3_map/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10150) delete from acidTbl where a in(select a from nonAcidOrcTbl) fails

2015-03-31 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10150:
--
Attachment: HIVE-10050.patch

 delete from acidTbl where a in(select a from nonAcidOrcTbl) fails
 -

 Key: HIVE-10150
 URL: https://issues.apache.org/jira/browse/HIVE-10150
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Transactions
Affects Versions: 1.1.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10050.patch


 this query raises an error 
 10297,FAILED: SemanticException [Error 10297]: Attempt to do update or 
 delete on table nonAcidOrcTbl that does not use an AcidOutputFormat or is not 
 bucketed
 even though nonAcidOrcTbl is only being read, not written.
 select b from  + Table.ACIDTBL +  where a in (select b from  + 
 Table.NONACIDORCTBL + )
 runs fine.
 There doesn't seem to be any logical reason why we should rise the error here.
 Same for 'update' statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10044) Allow interval params for year/month/day/hour/minute/second functions

2015-03-31 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389860#comment-14389860
 ] 

Thejas M Nair commented on HIVE-10044:
--

[~jdere] Can you also please update the function descriptions (@ Description 
annotation) for these?


 Allow interval params for year/month/day/hour/minute/second functions
 -

 Key: HIVE-10044
 URL: https://issues.apache.org/jira/browse/HIVE-10044
 Project: Hive
  Issue Type: Sub-task
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10044.1.patch


 Update the year/month/day/hour/minute/second functions to retrieve the 
 various fields of the year-month and day-time interval types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)

2015-03-31 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-10161:
--

Assignee: Gopal V  (was: Prasanth Jayachandran)

 LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
 has a bug)
 

 Key: HIVE-10161
 URL: https://issues.apache.org/jira/browse/HIVE-10161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Gopal V
 Fix For: llap


 The EncodedReaderImpl will die when reading from the cache, when reading data 
 written by the regular ORC writer 
 {code}
 Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
 size too small. size = 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 ... 22 more
 Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
 vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
 {code}
 Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10168) make groupby3_map.q more stable

2015-03-31 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10168:
---
Attachment: HIVE-10168.1.patch

patch #1

 make groupby3_map.q more stable
 ---

 Key: HIVE-10168
 URL: https://issues.apache.org/jira/browse/HIVE-10168
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-10168.1.patch


 The test run aggregation query which produces several DOUBLE numbers.
 Assertion framework compares output containing DOUBLE numbers without any 
 delta.
 As a result test is not stable
 e.g. build 3219 failed with the following test result
 {code}
 groupby3_map.q.out
 139c139
  130091.0260.182 256.10355987055016  98.00.0 
 142.92680950752379  143.06995106518903  20428.0728759   
 20469.010897795582
 ---
  130091.0260.182 256.10355987055016  98.00.0 
  142.9268095075238   143.06995106518906  20428.072876
  20469.01089779559
 {code}
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3219/testReport/junit/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_groupby3_map/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10159) HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388661#comment-14388661
 ] 

Hive QA commented on HIVE-10159:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708344/HIVE-10159.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8692 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_decimal
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3218/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3218/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3218/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708344 - PreCommit-HIVE-TRUNK-Build

 HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by 
 JoinDesc.keyTableDesc
 -

 Key: HIVE-10159
 URL: https://issues.apache.org/jira/browse/HIVE-10159
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10159.1.patch


 MapJoinDesc and HashTableSinkDesc are derived from JoinDesc
 HashTableSinkDesc and MapJoinDesc have keyTblDesc field.
 JoinDesc has keyTableDesc field.
 I think HashTableSinkDesc and MapJoinDesc can use superclass (JoinDesc) 
 keyTableDesc field instead of defining their own keyTblDesc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-03-31 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-10165:
---
Description: 
h3. Overview
I'd like to extend the 
[hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
 API so that it also supports the writing of record updates and deletes in 
addition to the already supported inserts.

h3. Motivation
We have many Hadoop processes outside of Hive that merge changed facts into 
existing datasets. Traditionally we achieve this by: reading in a ground-truth 
dataset and a modified dataset, grouping by a key, sorting by a sequence and 
then applying a function to determine inserted, updated, and deleted rows. 
However, in our current scheme we must rewrite all partitions that may 
potentially contain changes. In practice the number of mutated records is very 
small when compared with the records contained in a partition. This approach 
results in a number of operational issues:
* Excessive amount of write activity required for small data changes.
* Downstream applications cannot robustly read these datasets while they are 
being updated.
* Due to scale of the updates (hundreds or partitions) the scope for contention 
is high. 

I believe we can address this problem by instead writing only the changed 
records to a Hive transactional table. This should drastically reduce the 
amount of data that we need to write and also provide a means for managing 
concurrent access to the data. Our existing merge processes can read and retain 
each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an 
updated form of the hive-hcatalog-streaming API which will then have the 
required data to perform an update or insert in a transactional manner. 

h3. Benefits
* Enables the creation of large-scale dataset merge processes  
* Opens up Hive transactional functionality in an accessible manner to 
processes that operate outside of Hive.

h3. Implementation
We've patched the API to provide visibility to the underlying 
{{OrcRecordUpdater}} and allow extension of the {{AbstractRecordWriter}} by 
third-parties outside of the package. We've also updated the user facing 
interfaces to provide update and delete functionality. I've provided the 
modifications as three incremental patches. Generally speaking, each patch 
makes the API less backwards compatible but more consistent with respect to 
offering updates, deletes as well as writes (inserts). Ideally I hope that all 
three patches have merit, but only the first patch is absolutely necessary to 
enable the features we need on the API, and it does so in a backwards 
compatible way. I'll summarise the contents of each patch:

h4. [^HIVE-10165.0.patch] - Required
This patch contains what we consider to be the minimum amount of changes 
required to allow users to create {{RecordWriter}} subclasses that can insert, 
update, and  delete records. These changes also maintain backwards 
compatibility at the expense of confusing the API a little. Note that the row 
representation has be changed from {{byte[]}} to {{Object}}. Within our data 
processing jobs our records are often available in a strongly typed and decoded 
form such as a POJO or a Tuple object. Therefore is seems to make sense that we 
are able to pass this through to the {{OrcRecordUpdater}} without having to go 
through a {{byte[]}} encoding step. This of course still allows users to use 
{{byte[]}} if they wish.

h4. [^HIVE-10165.1.patch] - Nice to have
This patch builds on the changes made in the *required* patch and aims to make 
the API cleaner and more consistent while accommodating updates and inserts. It 
also adds some logic to prevent the user from submitting multiple operation 
types to a single {{TransactionBatch}} as we found this creates data 
inconsistencies within the Hive table. This patch breaks backwards 
compatibility.

h4. [^HIVE-10165.2.patch] - Nomenclature
This final patch simply renames some of existing types to more accurately 
convey their increased responsibilities. The API is no longer writing just new 
records, it is now also responsible for writing operations that are applied to 
existing records. This patch breaks backwards compatibility.

h3. Example
I've attached simple typical usage of the API. This is not a patch and is 
intended as an illustration only: [^ReflectiveOperationWriter.java]

h3. Known issues
I have not yet provided any unit tests for the extended functionality. I fully 
expect that tests are required and will work on these if my patches have merit.

  was:
h3. Overview
I'd like to extend the 
[hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
 API so that it also supports the writing of record updates and deletes in 
addition to the already supported inserts.

h3. Motivation
We have many Hadoop processes outside of Hive that merge changed 

[jira] [Assigned] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]

2015-03-31 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao reassigned HIVE-10134:
---

Assignee: Chao

 Fix test failures after HIVE-10130 [Spark Branch]
 -

 Key: HIVE-10134
 URL: https://issues.apache.org/jira/browse/HIVE-10134
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Chao

 Complete test run: 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink
 *Failed tests:*
 {noformat}
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq
 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-03-31 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-10165:
---
Description: 
h3. Overview
I'd like to extend the 
[hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
 API so that it also supports the writing of record updates and deletes in 
addition to the already supported inserts.

h3. Motivation
We have many Hadoop processes outside of Hive that merge changed facts into 
existing datasets. Traditionally we achieve this by: reading in a ground-truth 
dataset and a modified dataset, grouping by a key, sorting by a sequence and 
then applying a function to determine inserted, updated, and deleted rows. 
However, in our current scheme we must rewrite all partitions that may 
potentially contain changes. In practice the number of mutated records is very 
small when compared with the records contained in a partition. This approach 
results in a number of operational issues:
* Excessive amount of write activity required for small data changes.
* Downstream applications cannot robustly read these datasets while they are 
being updated.
* Due to scale of the updates (hundreds or partitions) the scope for contention 
is high. 

I believe we can address this problem by instead writing only the changed 
records to a Hive transactional table. This should drastically reduce the 
amount of data that we need to write and also provide a means for managing 
concurrent access to the data. Our existing merge processes can read and retain 
each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an 
updated form of the hive-hcatalog-streaming API which will then have the 
required data to perform an update or insert in a transactional manner. 

h3. Benefits
* Enables the creation of large-scale dataset merge processes  
* Opens up Hive transactional functionality in an accessible manner to 
processes that operate outside of Hive.

h3. Implementation
We've patched the API to provide visibility to the underlying 
{{OrcRecordUpdater}} and allow extension of the {{AbstractRecordWriter}} by 
third-parties outside of the package. We've also updated the user facing 
interfaces to provide update and delete functionality. I've provided the 
modifications as three incremental patches. Generally speaking, each patch 
makes the API less backwards compatible but more consistent with respect to 
offering updates, deletes as well as writes (inserts). Ideally I hope that all 
three patches have merit, but only the first patch is absolutely necessary to 
enable the features we need on the API, and it does so in a backwards 
compatible way. I'll summarise the contents of each patch:

h4. [^HIVE-10165.0.patch] - Required
This patch contains what we consider to be the minimum amount of changes 
required to allow users to create {{RecordWriter}} subclasses that can insert, 
update, and  delete records. These changes also maintain backwards 
compatibility at the expense of confusing the API a little. Note that the row 
representation has be changed from {{byte[]}} to {{Object}}. Within our data 
processing jobs our records are often available in a strongly typed and decoded 
form such as a POJO or a Tuple object. Therefore is seems to make sense that we 
are able to pass this through to the {{OrcRecordUpdater}} without having to go 
through a {{byte[]}} encoding step. This of course still allows users to use 
{{byte[]}} if they wish.

h4. [^HIVE-10165.1.patch] - Nice to have
This patch builds on the changes made in the *required* patch and aims to make 
the API cleaner and more consistent while accommodating updates and inserts. It 
also adds some logic to prevent the user from submitting multiple operation 
types to a single {{TransactionBatch}} as we found this creates data 
inconsistencies within the Hive table. This patch breaks backwards 
compatibility.

h4. [^HIVE-10165.2.patch] - Nomenclature
This final patch simply renames some of existing types to more accurately 
convey their increased responsibilities. The API is no longer writing just new 
records, it is now also responsible for writing operations that are applied to 
existing records. This patch breaks backwards compatibility.

h3. Example
I've attached simple typical usage of the API. This is not a patch and is 
intended as an illustration only: [^ReflectiveOperationWriter.java]

h3. Known issues
I have not yet provided any unit tests for the extended functionality. I fully 
expect that these are required and will work on these if these patches have 
merit.

  was:
h3. Overview
I'd like to extend the 
[hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
 API so that it also supports the writing of record updates and deletes in 
addition to the already supported inserts.

h3. Motivation
We have many Hadoop processes outside of Hive that merge 

[jira] [Commented] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390014#comment-14390014
 ] 

Hive QA commented on HIVE-10134:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708601/HIVE-10134.2-spark.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8710 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/818/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/818/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-818/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708601 - PreCommit-HIVE-SPARK-Build

 Fix test failures after HIVE-10130 [Spark Branch]
 -

 Key: HIVE-10134
 URL: https://issues.apache.org/jira/browse/HIVE-10134
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-10134.1-spark.patch, HIVE-10134.2-spark.patch


 Complete test run: 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink
 *Failed tests:*
 {noformat}
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq
 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob

2015-03-31 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-10050:
--
Labels: TODOC1.2  (was: )

 Support overriding memory configuration for AM launched for 
 TempletonControllerJob
 --

 Key: HIVE-10050
 URL: https://issues.apache.org/jira/browse/HIVE-10050
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hitesh Shah
Assignee: Hitesh Shah
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10050.1.patch, HIVE-10050.2.patch, 
 HIVE-10050.3.patch


 The MR AM launched for the TempletonControllerJob does not do any heavy 
 lifting and therefore can be configured to use a small memory footprint ( as 
 compared to potentially using the default footprint for most MR jobs on a 
 cluster ). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob

2015-03-31 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390018#comment-14390018
 ] 

Lefty Leverenz commented on HIVE-10050:
---

Doc note:  This adds *templeton.controller.mr.am.java.opts* and 
*templeton.mr.am.memory.mb* to webhcat-default.xml, so they need to be 
documented (with version information) in the WebHCat Configuration wikidoc.

* [WebHCat Configuration -- Configuration Variables | 
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-ConfigurationVariables]

 Support overriding memory configuration for AM launched for 
 TempletonControllerJob
 --

 Key: HIVE-10050
 URL: https://issues.apache.org/jira/browse/HIVE-10050
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Hitesh Shah
Assignee: Hitesh Shah
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-10050.1.patch, HIVE-10050.2.patch, 
 HIVE-10050.3.patch


 The MR AM launched for the TempletonControllerJob does not do any heavy 
 lifting and therefore can be configured to use a small memory footprint ( as 
 compared to potentially using the default footprint for most MR jobs on a 
 cluster ). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390023#comment-14390023
 ] 

Hive QA commented on HIVE-9518:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708534/HIVE-9518.10.patch

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8647 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-root_dir_external_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-parallel_orderby.q-reduce_deduplicate.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-infer_bucket_sort_bucketed_table.q-bucket4.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3230/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3230/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3230/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708534 - PreCommit-HIVE-TRUNK-Build

 Implement MONTHS_BETWEEN aligned with Oracle one
 

 Key: HIVE-9518
 URL: https://issues.apache.org/jira/browse/HIVE-9518
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Xiaobing Zhou
Assignee: Alexander Pivovarov
 Attachments: HIVE-9518.1.patch, HIVE-9518.10.patch, 
 HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, 
 HIVE-9518.6.patch, HIVE-9518.7.patch, HIVE-9518.8.patch, HIVE-9518.9.patch


 This is used to track work to build Oracle like months_between. Here's 
 semantics:
 MONTHS_BETWEEN returns number of months between dates date1 and date2. If 
 date1 is later than date2, then the result is positive. If date1 is earlier 
 than date2, then the result is negative. If date1 and date2 are either the 
 same days of the month or both last days of months, then the result is always 
 an integer. Otherwise Oracle Database calculates the fractional portion of 
 the result based on a 31-day month and considers the difference in time 
 components date1 and date2.
 Should accept date, timestamp and string arguments in the format '-MM-dd' 
 or '-MM-dd HH:mm:ss'.
 The result should be rounded to 8 decimal places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8915) Log file explosion due to non-existence of COMPACTION_QUEUE table

2015-03-31 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390009#comment-14390009
 ] 

Owen O'Malley commented on HIVE-8915:
-

Yeah, I hit this too. What is the fix, [~alangates]?

 Log file explosion due to non-existence of COMPACTION_QUEUE table
 -

 Key: HIVE-8915
 URL: https://issues.apache.org/jira/browse/HIVE-8915
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 0.14.0, 0.15.0, 0.14.1
Reporter: Sushanth Sowmyan
Assignee: Alan Gates

 I hit an issue with a fresh set up of hive in a vm, where I did not have db 
 tables as specified by hive-txn-schema-0.14.0.mysql.sql created.
 On metastore startup, I got an endless loop of errors being populated to the 
 log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k 
 copies of the same error stack trace in it before I realized what was 
 happening and killed it. We should either have a delay of sorts to make sure 
 we don't endlessly respin on that error so quickly, or we should error out 
 and fail if we're not able to start.
 The stack trace in question is as follows:
 {noformat}
 2014-11-19 01:44:57,654 ERROR compactor.Cleaner
 (Cleaner.java:run(143)) - Caught an exception in the main loop of
 compactor cleaner, MetaException(message:Unable to connect to
 transaction database
 com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table
 'hive.COMPACTION_QUEUE' doesn't exist
 at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
 at com.mysql.jdbc.Util.getInstance(Util.java:386)
 at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)
 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)
 at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990)
 at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619)
 at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569)
 at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524)
 at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)
 at 
 org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266)
 at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
 )
 at 
 org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291)
 at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10104) LLAP: Generate consistent splits and locations for the same split across jobs

2015-03-31 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390028#comment-14390028
 ] 

Lefty Leverenz commented on HIVE-10104:
---

This adds *hive.tez.input.generate.consistent.splits* to HiveConf.java, so I'm 
linking it to HIVE-9850 (documentation for llap).

 LLAP: Generate consistent splits and locations for the same split across jobs
 -

 Key: HIVE-10104
 URL: https://issues.apache.org/jira/browse/HIVE-10104
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Fix For: llap

 Attachments: HIVE-10104.1.txt, HIVE-10104.2.txt


 Locations for splits are currently randomized. Also, the order of splits is 
 random - depending on how threads end up generating the splits.
 Add an option to sort the splits, and generate repeatable locations - 
 assuming all other factors are the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8818) Create unit test where we insert into an encrypted table and then read from it with hcatalog mapreduce

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388776#comment-14388776
 ] 

Hive QA commented on HIVE-8818:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708348/HIVE-8818.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8699 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3219/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3219/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3219/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708348 - PreCommit-HIVE-TRUNK-Build

 Create unit test where we insert into an encrypted table and then read from 
 it with hcatalog mapreduce
 --

 Key: HIVE-8818
 URL: https://issues.apache.org/jira/browse/HIVE-8818
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Dong Chen
 Attachments: HIVE-8818.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]

2015-03-31 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-10134:

Attachment: HIVE-10134.1-spark.patch

Patch v1 to address the union test failures.

 Fix test failures after HIVE-10130 [Spark Branch]
 -

 Key: HIVE-10134
 URL: https://issues.apache.org/jira/browse/HIVE-10134
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-10134.1-spark.patch


 Complete test run: 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink
 *Failed tests:*
 {noformat}
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq
 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10077) Use new ParquetInputSplit constructor API

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389213#comment-14389213
 ] 

Hive QA commented on HIVE-10077:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708434/HIVE-10077.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8691 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3221/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3221/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3221/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708434 - PreCommit-HIVE-TRUNK-Build

 Use new ParquetInputSplit constructor API
 -

 Key: HIVE-10077
 URL: https://issues.apache.org/jira/browse/HIVE-10077
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-10077.1.patch, HIVE-10077.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]

2015-03-31 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389226#comment-14389226
 ] 

Xuefu Zhang commented on HIVE-9969:
---

+1

 Avoid Utilities.getMapRedWork for spark [Spark Branch]
 --

 Key: HIVE-9969
 URL: https://issues.apache.org/jira/browse/HIVE-9969
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Minor
 Attachments: HIVE-9969.1-spark.patch


 The method shouldn't be used for spark mode. Specifically, map work and 
 reduce work have different plan paths in spark. Calling this method will 
 leave lots of errors in executor's log:
 {noformat}
 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: 
 hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: 
 /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
 at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled

2015-03-31 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-10021:

Description: 
When HiveServer2 is configured to authorize submitted queries and statements 
through Sentry, any attempt to issue an alter index rebuild statement fails 
with a SemanticException caused by a NullPointerException. This occurs 
regardless of whether the index is a compact or bitmap index. 

The root cause of the problem appears to be the fact that the static 
createRootTask function in org.apache.hadoop.hive.ql.optimizer.IndexUtils 
creates a new 
org.apache.hadoop.hive.ql.Driver object to compile the index builder query, and 
this new Driver object, unlike the one used by HiveServer2 to compile the 
submitted statement, is used without having its userName field initialized 
with the submitting user's username. Adding null checks to the Sentry code is 
insufficient to solve this problem, because Sentry needs the userName to 
determine whether or not the submitting user should be able to execute the 
index rebuild statement.

Example stack trace from the HiveServer2 logs:

{noformat}
FAILED: NullPointerException null
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
at org.apache.hadoop.security.Groups.getGroups(Groups.java:161)
at 
org.apache.sentry.provider.common.HadoopGroupMappingService.getGroups(HadoopGroupMappingService.java:46)
at 
org.apache.sentry.binding.hive.authz.HiveAuthzBinding.getGroups(HiveAuthzBinding.java:370)
at 
org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:314)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440)
at 
org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:258)
at 
org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:149)
at 
org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:67)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1171)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1117)
at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:410)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:204)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1019)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100)
at 
org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:173)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:370)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:357)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:238)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:393)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
at 
org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:99)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:479)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at 

[jira] [Updated] (HIVE-10148) update of bucking column should not be allowed

2015-03-31 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10148:
--
Attachment: HIVE-10148.patch

 update of bucking column should not be allowed
 --

 Key: HIVE-10148
 URL: https://issues.apache.org/jira/browse/HIVE-10148
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.1.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10148.patch


 update tbl set a = 5;
 should raise an error if 'a' is a bucketing column.
 Such operation is not supported but currently not checked for.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3166) The Hive JDBC driver should accept hive conf and hive variables via connection URL

2015-03-31 Thread Mark Grey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388914#comment-14388914
 ] 

Mark Grey commented on HIVE-3166:
-

What's the status on this feature?  I could see this being very useful.  Is it 
possible to rebase the patch or is there another implementation now in place 
for custom hive properties via JDBC driver?

 The Hive JDBC driver should accept hive conf and hive variables via 
 connection URL
 --

 Key: HIVE-3166
 URL: https://issues.apache.org/jira/browse/HIVE-3166
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
Reporter: Prasad Mujumdar
Assignee: Prasad Mujumdar
  Labels: api-addition
 Attachments: HIVE-3166-3.patch


 The JDBC driver supports running embedded hive. The Hive CLI can accept 
 configuration and hive settings on command line that can be passed down. But 
 the JDBC driver currently doesn't support this.
 Its also required for SQLLine CLI support since that is a JDBC application. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access

2015-03-31 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10128:

Attachment: HIVE-10128.03.patch

 BytesBytesMultiHashMap does not allow concurrent read-only access
 -

 Key: HIVE-10128
 URL: https://issues.apache.org/jira/browse/HIVE-10128
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, 
 HIVE-10128.03.patch, HIVE-10128.patch, hashmap-after.png, 
 hashmap-sync-source.png, hashmap-sync.png


 The multi-threaded performance takes a serious hit when LLAP shares 
 hashtables between the probe threads running in parallel. 
 !hashmap-sync.png!
 This is an explicit synchronized block inside ReusableRowContainer which 
 triggers this particular pattern.
 !hashmap-sync-source.png!
 Looking deeper into the code, the synchronization seems to be caused due to 
 the fact that WriteBuffers.setReadPoint modifies the otherwise read-only 
 hashtable.
 To generate this sort of result, run LLAP at a WARN log-level, to avoid all 
 the log synchronization that otherwise affects the thread sync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9073) NPE when using custom windowing UDAFs

2015-03-31 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-9073:
-
Attachment: HIVE-9073.3.patch

rebasing patch with trunk

 NPE when using custom windowing UDAFs
 -

 Key: HIVE-9073
 URL: https://issues.apache.org/jira/browse/HIVE-9073
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-9073.1.patch, HIVE-9073.2.patch, HIVE-9073.2.patch, 
 HIVE-9073.3.patch


 From the hive-user email group:
 {noformat}
 While executing a simple select query using a custom windowing UDAF I created 
 I am constantly running into this error.
  
 Error: java.lang.RuntimeException: Error in configuring object
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
 ... 9 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173)
 ... 14 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:647)
 at 
 org.apache.hadoop.hive.ql.exec.FunctionRegistry.getWindowFunctionInfo(FunctionRegistry.java:1875)
 at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.streamingPossible(WindowingTableFunction.java:150)
 at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.setCanAcceptInputAsStream(WindowingTableFunction.java:221)
 at 
 org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.initializeStreaming(WindowingTableFunction.java:266)
 at 
 org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.initializeStreaming(PTFOperator.java:292)
 at 
 org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:86)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40)
 at 
 org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166)
 ... 14 more
  
 Just wanted to check if any of you have faced this earlier. Also when I try 
 to run the Custom UDAF on another server it works fine. The only difference I 
 can see it that the hive version I am using on my local machine is 0.13.1 
 where it is working and on the other machine it is 0.13.0 where I see the 
 above mentioned error. I am not sure if this was a bug which was fixed in the 
 later release but I just wanted to confirm the same.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)

2015-03-31 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10161:

Summary: LLAP: ORC file contains compression buffers larger than bufferSize 
(OR reader has a bug)  (was: LLAP: IO buffers seem to be hard-coded to 256kb )

 LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
 has a bug)
 

 Key: HIVE-10161
 URL: https://issues.apache.org/jira/browse/HIVE-10161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap


 The EncodedReaderImpl will die when reading from the cache, when reading data 
 written by the regular ORC writer 
 {code}
 Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
 size too small. size = 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 ... 22 more
 Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
 vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
 {code}
 Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)

2015-03-31 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10161:

Assignee: Prasanth Jayachandran  (was: Sergey Shelukhin)

 LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
 has a bug)
 

 Key: HIVE-10161
 URL: https://issues.apache.org/jira/browse/HIVE-10161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Prasanth Jayachandran
 Fix For: llap


 The EncodedReaderImpl will die when reading from the cache, when reading data 
 written by the regular ORC writer 
 {code}
 Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
 size too small. size = 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 ... 22 more
 Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
 vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
 {code}
 Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)

2015-03-31 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389043#comment-14389043
 ] 

Gopal V commented on HIVE-10161:


This bug may be due to something else entirely - the error disappears when you 
restart LLAP and never load varchar columns in.

Needs more investigation.

 LLAP: ORC file contains compression buffers larger than bufferSize (OR reader 
 has a bug)
 

 Key: HIVE-10161
 URL: https://issues.apache.org/jira/browse/HIVE-10161
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Gopal V
Assignee: Prasanth Jayachandran
 Fix For: llap


 The EncodedReaderImpl will die when reading from the cache, when reading data 
 written by the regular ORC writer 
 {code}
 Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer 
 size too small. size = 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
 at 
 org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 ... 22 more
 Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 
 262144 needed = 3919246
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780)
 at 
 org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628)
 at 
 org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278)
 at 
 org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48)
 at 
 org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
 ... 4 more
 ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
 vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null]
 {code}
 Turning off hive.llap.io.enabled makes the error go away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access

2015-03-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388920#comment-14388920
 ] 

Sergey Shelukhin commented on HIVE-10128:
-

Sorry, use fell thru the cracks when I was making trunk patch out of llap 
patch. debugDumpMetrics is threadsafe.
Renamed.
Non-thread-safe methods may be more logical for some callers..

 BytesBytesMultiHashMap does not allow concurrent read-only access
 -

 Key: HIVE-10128
 URL: https://issues.apache.org/jira/browse/HIVE-10128
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, 
 HIVE-10128.patch, hashmap-after.png, hashmap-sync-source.png, hashmap-sync.png


 The multi-threaded performance takes a serious hit when LLAP shares 
 hashtables between the probe threads running in parallel. 
 !hashmap-sync.png!
 This is an explicit synchronized block inside ReusableRowContainer which 
 triggers this particular pattern.
 !hashmap-sync-source.png!
 Looking deeper into the code, the synchronization seems to be caused due to 
 the fact that WriteBuffers.setReadPoint modifies the otherwise read-only 
 hashtable.
 To generate this sort of result, run LLAP at a WARN log-level, to avoid all 
 the log synchronization that otherwise affects the thread sync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9727) GroupingID translation from Calcite

2015-03-31 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9727:
---
Affects Version/s: 1.1.0
   0.14.0
   1.0.0

 GroupingID translation from Calcite
 ---

 Key: HIVE-9727
 URL: https://issues.apache.org/jira/browse/HIVE-9727
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 0.14.0, 1.0.0, 1.1.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 1.2.0

 Attachments: HIVE-9727.01.patch, HIVE-9727.02.patch, 
 HIVE-9727.03.patch, HIVE-9727.04.patch, HIVE-9727.patch


 The translation from Calcite back to Hive might produce wrong results while 
 interacting with other Calcite optimization rules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10083) SMBJoin fails in case one table is uninitialized

2015-03-31 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388940#comment-14388940
 ] 

Chao commented on HIVE-10083:
-

+1. I think the test failure is unrelated.

 SMBJoin fails in case one table is uninitialized
 

 Key: HIVE-10083
 URL: https://issues.apache.org/jira/browse/HIVE-10083
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.13.0
 Environment: MapR Hive 0.13
Reporter: Alain Schröder
Assignee: Na Yang
Priority: Minor
 Attachments: HIVE-10083.patch


 We experience IndexOutOfBoundsException in a SMBJoin in the case on the 
 tables used for the JOIN is uninitialized. Everything works if both are 
 uninitialized or initialized.
 {code}
 2015-03-24 09:12:58,967 ERROR [main]: ql.Driver 
 (SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException 
 Index: 0, Size: 0
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51)
 [...]
 {code}
 Simplest way to reproduce:
 {code}
 SET hive.enforce.sorting=true;
 SET hive.enforce.bucketing=true;
 SET hive.exec.dynamic.partition=true;
 SET mapreduce.reduce.import.limit=-1;
 SET hive.optimize.bucketmapjoin=true;
 SET hive.optimize.bucketmapjoin.sortedmerge=true;
 SET hive.auto.convert.join=true;
 SET hive.auto.convert.sortmerge.join=true;
 SET hive.auto.convert.sortmerge.join.noconditionaltask=true;
 CREATE DATABASE IF NOT EXISTS tmp;
 USE tmp;
 CREATE  TABLE `test1` (
   `foo` bigint )
 CLUSTERED BY (
   foo)
 SORTED BY (
   foo ASC)
 INTO 384 BUCKETS
 stored as orc;
 CREATE  TABLE `test2`(
   `foo` bigint )
 CLUSTERED BY (
   foo)
 SORTED BY (
   foo ASC)
 INTO 384 BUCKETS
 STORED AS ORC;
 -- Initialize ONE table of the two tables with any data.
 INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100;
 SELECT t1.foo, t2.foo
 FROM test1 t1 INNER JOIN test2 t2 
 ON (t1.foo = t2.foo);
 {code}
 I took a look at the Procedure 
 fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in 
 AbstractBucketJoinProc.java and it does not seem to have changed from our 
 MapR Hive 0.13 to current snapshot, so this should be also an error in the 
 current Version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10083) SMBJoin fails in case one table is uninitialized

2015-03-31 Thread Na Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388973#comment-14388973
 ] 

Na Yang commented on HIVE-10083:


Thank you [~csun] for the code review. I ran the q test for smb_mapjoin_8.q on 
my local machine and it was successful.  

 SMBJoin fails in case one table is uninitialized
 

 Key: HIVE-10083
 URL: https://issues.apache.org/jira/browse/HIVE-10083
 Project: Hive
  Issue Type: Bug
  Components: Logical Optimizer
Affects Versions: 0.13.0
 Environment: MapR Hive 0.13
Reporter: Alain Schröder
Assignee: Na Yang
Priority: Minor
 Attachments: HIVE-10083.patch


 We experience IndexOutOfBoundsException in a SMBJoin in the case on the 
 tables used for the JOIN is uninitialized. Everything works if both are 
 uninitialized or initialized.
 {code}
 2015-03-24 09:12:58,967 ERROR [main]: ql.Driver 
 (SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException 
 Index: 0, Size: 0
 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540)
 at 
 org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549)
 at 
 org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51)
 [...]
 {code}
 Simplest way to reproduce:
 {code}
 SET hive.enforce.sorting=true;
 SET hive.enforce.bucketing=true;
 SET hive.exec.dynamic.partition=true;
 SET mapreduce.reduce.import.limit=-1;
 SET hive.optimize.bucketmapjoin=true;
 SET hive.optimize.bucketmapjoin.sortedmerge=true;
 SET hive.auto.convert.join=true;
 SET hive.auto.convert.sortmerge.join=true;
 SET hive.auto.convert.sortmerge.join.noconditionaltask=true;
 CREATE DATABASE IF NOT EXISTS tmp;
 USE tmp;
 CREATE  TABLE `test1` (
   `foo` bigint )
 CLUSTERED BY (
   foo)
 SORTED BY (
   foo ASC)
 INTO 384 BUCKETS
 stored as orc;
 CREATE  TABLE `test2`(
   `foo` bigint )
 CLUSTERED BY (
   foo)
 SORTED BY (
   foo ASC)
 INTO 384 BUCKETS
 STORED AS ORC;
 -- Initialize ONE table of the two tables with any data.
 INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100;
 SELECT t1.foo, t2.foo
 FROM test1 t1 INNER JOIN test2 t2 
 ON (t1.foo = t2.foo);
 {code}
 I took a look at the Procedure 
 fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in 
 AbstractBucketJoinProc.java and it does not seem to have changed from our 
 MapR Hive 0.13 to current snapshot, so this should be also an error in the 
 current Version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10167) HS2 logs the server started only before the server is shut down

2015-03-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10167:
---
Attachment: HIVE-10167.1.patch

 HS2 logs the server started only before the server is shut down
 ---

 Key: HIVE-10167
 URL: https://issues.apache.org/jira/browse/HIVE-10167
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Attachments: HIVE-10167.1.patch


 TThreadPoolServer#serve() blocks till the server is down. We should log 
 before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10159) HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc

2015-03-31 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-10159:
---
Attachment: HIVE-10159.1.patch

TestCliDriver mapjoin_decimal.q and TestMinimrCliDriver smb_mapjoin_8.q work 
fine for me locally
Let me reattach the patch #1 to run the build again


 HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by 
 JoinDesc.keyTableDesc
 -

 Key: HIVE-10159
 URL: https://issues.apache.org/jira/browse/HIVE-10159
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10159.1.patch, HIVE-10159.1.patch


 MapJoinDesc and HashTableSinkDesc are derived from JoinDesc
 HashTableSinkDesc and MapJoinDesc have keyTblDesc field.
 JoinDesc has keyTableDesc field.
 I think HashTableSinkDesc and MapJoinDesc can use superclass (JoinDesc) 
 keyTableDesc field instead of defining their own keyTblDesc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10151) insert into A select from B is broken when both A and B are Acid tables and bucketed the same way

2015-03-31 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10151:
--
Summary: insert into A select from B is broken when both A and B are Acid 
tables and bucketed the same way  (was: Acid table insert as select)

 insert into A select from B is broken when both A and B are Acid tables and 
 bucketed the same way
 -

 Key: HIVE-10151
 URL: https://issues.apache.org/jira/browse/HIVE-10151
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, Transactions
Affects Versions: 1.1.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 BucketingSortingReduceSinkOptimizer makes 
 insert into AcidTable select * from otherAcidTable
 use BucketizedHiveInputFormat which bypasses ORC merge logic on read and 
 tries to send bucket files (rather than table dir) down to OrcInputFormat.
 (this is true only if both AcidTable and otherAcidTable are bucketed the same 
 way).  Then ORC dies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access

2015-03-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388906#comment-14388906
 ] 

Sergey Shelukhin commented on HIVE-10128:
-

It's used in the updated patch. I'll look at the failures and feedback and 
update the patch

 BytesBytesMultiHashMap does not allow concurrent read-only access
 -

 Key: HIVE-10128
 URL: https://issues.apache.org/jira/browse/HIVE-10128
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, 
 HIVE-10128.patch, hashmap-after.png, hashmap-sync-source.png, hashmap-sync.png


 The multi-threaded performance takes a serious hit when LLAP shares 
 hashtables between the probe threads running in parallel. 
 !hashmap-sync.png!
 This is an explicit synchronized block inside ReusableRowContainer which 
 triggers this particular pattern.
 !hashmap-sync-source.png!
 Looking deeper into the code, the synchronization seems to be caused due to 
 the fact that WriteBuffers.setReadPoint modifies the otherwise read-only 
 hashtable.
 To generate this sort of result, run LLAP at a WARN log-level, to avoid all 
 the log synchronization that otherwise affects the thread sync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10092) LLAP: improve how buffers are locked for split

2015-03-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389103#comment-14389103
 ] 

Sergey Shelukhin commented on HIVE-10092:
-

This is not a simple problem... 

 LLAP: improve how buffers are locked for split
 --

 Key: HIVE-10092
 URL: https://issues.apache.org/jira/browse/HIVE-10092
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 Right now, for simplicity, entire split of decompressed buffers is locked in 
 cache, in case some buffers are shared between RGs, to avoid dealing with 
 situations where we uncompress some data, pass it on to processor for RG N, 
 then processor processes and unlocks it, and before we can pass it on for RG 
 N+1 it's evicted. 
 However, if split is too big, and cache is small, or many splits are 
 processed at the same time, this can result in a deadlock as entire cache is 
 locked. We need to improve locking to be more granular and probably also try 
 to avoid deadlocks in general (bypass cache?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9845) HCatSplit repeats information making input split data size huge

2015-03-31 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388925#comment-14388925
 ] 

Mithun Radhakrishnan commented on HIVE-9845:


Bah, finally. Unrelated test-failures.

 HCatSplit repeats information making input split data size huge
 ---

 Key: HIVE-9845
 URL: https://issues.apache.org/jira/browse/HIVE-9845
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Rohini Palaniswamy
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch


 Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which 
 has even triple the number of splits(100K+ splits and tasks) does not hit 
 that issue.
 {code}
 HCatBaseInputFormat.java:
  //Call getSplit on the InputFormat, create an
   //HCatSplit for each underlying split
   //NumSplits is 0 for our purposes
   org.apache.hadoop.mapred.InputSplit[] baseSplits = 
 inputFormat.getSplits(jobConf, 0);
   for(org.apache.hadoop.mapred.InputSplit split : baseSplits) {
 splits.add(new HCatSplit(
 partitionInfo,
 split,allCols));
   }
 {code}
 Each hcatSplit duplicates partition schema and table schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10108) Index#getIndexTableName() returns db.index_table_name

2015-03-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10108:
---
Attachment: HIVE-10108.1.patch

These test failures are not related. Renamed the patch for trunk.

 Index#getIndexTableName() returns db.index_table_name
 -

 Key: HIVE-10108
 URL: https://issues.apache.org/jira/browse/HIVE-10108
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-10108.1-spark.patch, HIVE-10108.1.patch


 Index#getIndexTableName() used to just returns index table name. Now it 
 returns a qualified table name.  This change was introduced in HIVE-3781.
 As a result:
 IMetaStoreClient#getTable(index.getDbName(), index.getIndexTableName())
 throws ObjectNotFoundException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access

2015-03-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389026#comment-14389026
 ] 

Sergey Shelukhin commented on HIVE-10128:
-

Tests failed with a weird issue 
{noformat}
java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.serde2.WriteBuffers.getReadPosition()Lorg/apache/hadoop/hive/serde2/WriteBuffers$Position;
at 
org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.getValueRefs(BytesBytesMultiHashMap.java:268)
at 
org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testGetNonExistent(TestBytesBytesMultiHashMap.java:84)
{noformat}
which seems to be a build problem. They pass locally.

 BytesBytesMultiHashMap does not allow concurrent read-only access
 -

 Key: HIVE-10128
 URL: https://issues.apache.org/jira/browse/HIVE-10128
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, 
 HIVE-10128.03.patch, HIVE-10128.patch, hashmap-after.png, 
 hashmap-sync-source.png, hashmap-sync.png


 The multi-threaded performance takes a serious hit when LLAP shares 
 hashtables between the probe threads running in parallel. 
 !hashmap-sync.png!
 This is an explicit synchronized block inside ReusableRowContainer which 
 triggers this particular pattern.
 !hashmap-sync-source.png!
 Looking deeper into the code, the synchronization seems to be caused due to 
 the fact that WriteBuffers.setReadPoint modifies the otherwise read-only 
 hashtable.
 To generate this sort of result, run LLAP at a WARN log-level, to avoid all 
 the log synchronization that otherwise affects the thread sync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9845) HCatSplit repeats information making input split data size huge

2015-03-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-9845:
---
Attachment: (was: HIVE-9845.3.patch)

 HCatSplit repeats information making input split data size huge
 ---

 Key: HIVE-9845
 URL: https://issues.apache.org/jira/browse/HIVE-9845
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Rohini Palaniswamy
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch


 Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which 
 has even triple the number of splits(100K+ splits and tasks) does not hit 
 that issue.
 {code}
 HCatBaseInputFormat.java:
  //Call getSplit on the InputFormat, create an
   //HCatSplit for each underlying split
   //NumSplits is 0 for our purposes
   org.apache.hadoop.mapred.InputSplit[] baseSplits = 
 inputFormat.getSplits(jobConf, 0);
   for(org.apache.hadoop.mapred.InputSplit split : baseSplits) {
 splits.add(new HCatSplit(
 partitionInfo,
 split,allCols));
   }
 {code}
 Each hcatSplit duplicates partition schema and table schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9845) HCatSplit repeats information making input split data size huge

2015-03-31 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-9845:
---
Attachment: HIVE-9845.3.patch

 HCatSplit repeats information making input split data size huge
 ---

 Key: HIVE-9845
 URL: https://issues.apache.org/jira/browse/HIVE-9845
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Reporter: Rohini Palaniswamy
Assignee: Mithun Radhakrishnan
 Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch


 Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which 
 has even triple the number of splits(100K+ splits and tasks) does not hit 
 that issue.
 {code}
 HCatBaseInputFormat.java:
  //Call getSplit on the InputFormat, create an
   //HCatSplit for each underlying split
   //NumSplits is 0 for our purposes
   org.apache.hadoop.mapred.InputSplit[] baseSplits = 
 inputFormat.getSplits(jobConf, 0);
   for(org.apache.hadoop.mapred.InputSplit split : baseSplits) {
 splits.add(new HCatSplit(
 partitionInfo,
 split,allCols));
   }
 {code}
 Each hcatSplit duplicates partition schema and table schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10148) update of bucking column should not be allowed

2015-03-31 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10148:
--
Description: 
update tbl set a = 5;
should raise an error if 'a' is a bucketing column.
Such operation is not supported but currently not checked for.

  was:
update tbl set a = 5;
should raise an error if 'a' is a bucketing column.
Such operation is not supported but currently not checked.


 update of bucking column should not be allowed
 --

 Key: HIVE-10148
 URL: https://issues.apache.org/jira/browse/HIVE-10148
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.1.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 update tbl set a = 5;
 should raise an error if 'a' is a bucketing column.
 Such operation is not supported but currently not checked for.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one

2015-03-31 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9518:
--
Attachment: HIVE-9518.9.patch

The function should support both short date and full timestamp string format 
and it should not skip time part.
String lenght can not be used to determine the format because year might be 
less that 4 chars and day and month can be just 1 char

This is why I decided to use both Timestamp and Date converters to convert 
input value to java Date.
I also removed the fix I did before to GenericUDF which consider string lenght 
(str.length==10)

I added tests for dates without day, dates with partial time (no seconds) and 
dates with short year, month and day.

Now string Dates parsing behavious shold be consistend with other UDFs (e.g. 
datediff)

 Implement MONTHS_BETWEEN aligned with Oracle one
 

 Key: HIVE-9518
 URL: https://issues.apache.org/jira/browse/HIVE-9518
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Xiaobing Zhou
Assignee: Alexander Pivovarov
 Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, 
 HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch, HIVE-9518.7.patch, 
 HIVE-9518.8.patch, HIVE-9518.9.patch


 This is used to track work to build Oracle like months_between. Here's 
 semantics:
 MONTHS_BETWEEN returns number of months between dates date1 and date2. If 
 date1 is later than date2, then the result is positive. If date1 is earlier 
 than date2, then the result is negative. If date1 and date2 are either the 
 same days of the month or both last days of months, then the result is always 
 an integer. Otherwise Oracle Database calculates the fractional portion of 
 the result based on a 31-day month and considers the difference in time 
 components date1 and date2.
 Should accept date, timestamp and string arguments in the format '-MM-dd' 
 or '-MM-dd HH:mm:ss'.
 The result should be rounded to 8 decimal places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10167) HS2 logs the server started only before the server is shut down

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389339#comment-14389339
 ] 

Hive QA commented on HIVE-10167:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708478/HIVE-10167.1.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8677 tests executed
*Failed tests:*
{noformat}
TestCliDriver-join36.q-udf_bitwise_or.q-add_part_exist.q-and-12-more - did not 
produce a TEST-*.xml file
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3222/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3222/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3222/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708478 - PreCommit-HIVE-TRUNK-Build

 HS2 logs the server started only before the server is shut down
 ---

 Key: HIVE-10167
 URL: https://issues.apache.org/jira/browse/HIVE-10167
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Fix For: 1.2.0

 Attachments: HIVE-10167.1.patch


 TThreadPoolServer#serve() blocks till the server is down. We should log 
 before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389352#comment-14389352
 ] 

Hive QA commented on HIVE-10134:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708494/HIVE-10134.1-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8710 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6_subq
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/817/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/817/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-817/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708494 - PreCommit-HIVE-SPARK-Build

 Fix test failures after HIVE-10130 [Spark Branch]
 -

 Key: HIVE-10134
 URL: https://issues.apache.org/jira/browse/HIVE-10134
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Xuefu Zhang
Assignee: Chao
 Fix For: spark-branch

 Attachments: HIVE-10134.1-spark.patch


 Complete test run: 
 http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink
 *Failed tests:*
 {noformat}
 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22
 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq
 org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10114) Split strategies for ORC

2015-03-31 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10114:
-
Attachment: HIVE-10114.3.patch

Some clean up. Test case fixes. And hive conf description changes based on 
Gopal's comment.

 Split strategies for ORC
 

 Key: HIVE-10114
 URL: https://issues.apache.org/jira/browse/HIVE-10114
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-10114.1.patch, HIVE-10114.2.patch, 
 HIVE-10114.3.patch


 ORC split generation does not have clearly defined strategies for different 
 scenarios (many small orc files, few small orc files, many large files etc.). 
 Few strategies like storing the file footer in orc split, making entire file 
 as a orc split already exists. This JIRA to make the split generation 
 simpler, support different strategies for various use cases (BI, ETL, ACID 
 etc.) and to lay the foundation for HIVE-7428.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-03-31 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-10165:
---
Description: 
h3. Overview
I'd like to extend the 
[hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
 API so that it also supports the writing of record updates and deletes in 
addition to the already supported inserts.

h3. Motivation
We have many Hadoop processes outside of Hive that merge changed facts into 
existing datasets. Traditionally we achieve this by: reading in a ground-truth 
dataset and a modified dataset, grouping by a key, sorting by a sequence and 
then applying a function to determine inserted, updated, and deleted rows. 
However, in our current scheme we must rewrite all partitions that may 
potentially contain changes. In practice the number of mutated records is very 
small when compared with the records contained in a partition. This approach 
results in a number of operational issues:
* Excessive amount of write activity required for small data changes.
* Downstream applications cannot robustly read these datasets while they are 
being updated.
* Due to scale of the updates (hundreds or partitions) the scope for contention 
is high. 

I believe we can address this problem by instead writing only the changed 
records to a Hive transactional table. This should drastically reduce the 
amount of data that we need to write and also provide a means for managing 
concurrent access to the data. Our existing merge processes can read and retain 
each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an 
updated form of the hive-hcatalog-streaming API which will then have the 
required data to perform an update or insert in a transactional manner. 

h3. Benefits
* Enables the creation of large-scale dataset merge processes  
* Opens up Hive transactional functionality in an accessible manner to 
processes that operate outside of Hive.

h3. Implementation
We've patched the API to provide visibility to the underlying 
{{OrcRecordUpdater}} and allow extension of the {{AbstractRecordWriter}} by 
third-parties outside of the package. We've also updated the user facing 
interfaces to provide update and delete functionality. I've provided the 
modifications as three incremental patches. Generally speaking, each patch 
makes the API less backwards compatible but more consistent with respect to 
offering updates, deletes as well as writes (inserts). Ideally I hope that all 
three patches have merit, but only the first patch is absolutely necessary to 
enable the features we need on the API, and it does so in a backwards 
compatible way. I'll summarise the contents of each patch:

h4. [^HIVE-10165.0.patch] - Required
This patch contains what we consider to be the minimum amount of changes 
required to allow users to create {{RecordWriter}} subclasses that can insert, 
update, and  delete records. These changes also maintain backwards 
compatibility at the expense of confusing the API a little. Note that the row 
representation has be changed from {{byte[]}} to {{Object}}. Within our data 
processing jobs our records are often available in a strongly typed and decoded 
form such as a POJO or a Tuple object. Therefore is seems to make sense that we 
are able to pass this through to the {{OrcRecordUpdater}} without having to go 
through a {{byte[]}} encoding step. This of course still allows users to use 
{{byte[]}} if they wish.

h4. [^HIVE-10165.1.patch] - Nice to have
This patch builds on the changes made in the *required* patch and aims to make 
the API cleaner and more consistent while accommodating updates and inserts. It 
also adds some logic to prevent the user from submitting multiple operation 
types to a single {{TransactionBatch}} as we found this creates data 
inconsistencies within the Hive table. This patch breaks backwards 
compatibility.

h4. [^HIVE-10165.2.patch] - Nomenclature
This final patch simply renames some of existing types to more accurately 
convey their increased responsibilities. The API is no longer writing just new 
records, it is now also responsible for writing operations that are applied to 
existing records. This patch breaks backwards compatibility.

h3. Example
I've attached simple typical usage of the API. This is not a patch and is 
intended as an illustration only: [^ReflectiveOperationWriter.java]

h3. Known issues
I have not yet provided any unit tests for the extended functionality. I fully 
expect that these are required and will work on these if these patches have 
merit.

*Note: Attachments to follow.*

  was:
h3. Overview
I'd like to extend the 
[hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
 API so that it also supports the writing of record updates and deletes in 
addition to the already supported inserts.

h3. Motivation
We have many Hadoop processes 

[jira] [Commented] (HIVE-10116) CBO (Calcite Return Path): RelMdSize throws an Exception when Join is actually a Semijoin [CBO branch]

2015-03-31 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389422#comment-14389422
 ] 

Mostafa Mokhtar commented on HIVE-10116:


[~jcamachorodriguez]
Is this the same issue 
{code}
explain  
select  ca_zip, ca_county, sum(ws_sales_price)
 from
web_sales
JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk
JOIN customer_address ON customer.c_current_addr_sk = 
customer_address.ca_address_sk 
JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
JOIN item ON web_sales.ws_item_sk = item.i_item_sk 
 where
( item.i_item_id in (select i_item_id
 from item i2
 where i2.i_item_sk in (2, 3, 5, 7, 11, 13, 17, 19, 
23, 29)
 )
)
and d_qoy = 2 and d_year = 2000
 group by ca_zip, ca_county
 order by ca_zip, ca_county
 limit 100
15/03/27 12:16:48 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping CBO. 
java.lang.ArrayIndexOutOfBoundsException: 2
at 
org.apache.calcite.rel.metadata.RelMdSize.averageColumnSizes(RelMdSize.java:193)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$1$1.invoke(ReflectiveRelMetadataProvider.java:182)
at com.sun.proxy.$Proxy51.averageColumnSizes(Unknown Source)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109)
at com.sun.proxy.$Proxy51.averageColumnSizes(Unknown Source)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109)
at com.sun.proxy.$Proxy51.averageColumnSizes(Unknown Source)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.CachingRelMetadataProvider$CachingInvocationHandler.invoke(CachingRelMetadataProvider.java:131)
at com.sun.proxy.$Proxy51.averageColumnSizes(Unknown Source)
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getAverageColumnSizes(RelMetadataQuery.java:360)
at 
org.apache.calcite.rel.metadata.RelMdSize.averageRowSize(RelMdSize.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$1$1.invoke(ReflectiveRelMetadataProvider.java:182)
at com.sun.proxy.$Proxy51.averageRowSize(Unknown Source)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109)
at com.sun.proxy.$Proxy51.averageRowSize(Unknown Source)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109)
at com.sun.proxy.$Proxy51.averageRowSize(Unknown Source)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 

[jira] [Updated] (HIVE-10108) Index#getIndexTableName() returns db.index_table_name

2015-03-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-10108:
---
Attachment: HIVE-10108.2.patch

Attached patch v2 that's rebased to trunk latest.

 Index#getIndexTableName() returns db.index_table_name
 -

 Key: HIVE-10108
 URL: https://issues.apache.org/jira/browse/HIVE-10108
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-10108.1-spark.patch, HIVE-10108.1.patch, 
 HIVE-10108.2.patch


 Index#getIndexTableName() used to just returns index table name. Now it 
 returns a qualified table name.  This change was introduced in HIVE-3781.
 As a result:
 IMetaStoreClient#getTable(index.getDbName(), index.getIndexTableName())
 throws ObjectNotFoundException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10167) HS2 logs the server started only before the server is shut down

2015-03-31 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389663#comment-14389663
 ] 

Chao commented on HIVE-10167:
-

+1

 HS2 logs the server started only before the server is shut down
 ---

 Key: HIVE-10167
 URL: https://issues.apache.org/jira/browse/HIVE-10167
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Fix For: 1.2.0

 Attachments: HIVE-10167.1.patch


 TThreadPoolServer#serve() blocks till the server is down. We should log 
 before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10159) HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389478#comment-14389478
 ] 

Hive QA commented on HIVE-10159:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708483/HIVE-10159.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8692 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_decimal
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3224/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3224/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3224/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708483 - PreCommit-HIVE-TRUNK-Build

 HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by 
 JoinDesc.keyTableDesc
 -

 Key: HIVE-10159
 URL: https://issues.apache.org/jira/browse/HIVE-10159
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-10159.1.patch, HIVE-10159.1.patch


 MapJoinDesc and HashTableSinkDesc are derived from JoinDesc
 HashTableSinkDesc and MapJoinDesc have keyTblDesc field.
 JoinDesc has keyTableDesc field.
 I think HashTableSinkDesc and MapJoinDesc can use superclass (JoinDesc) 
 keyTableDesc field instead of defining their own keyTblDesc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one

2015-03-31 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9518:
--
Attachment: HIVE-9518.10.patch

patch #10 - added SEC_IN_31_DAYS constant for clarity

 Implement MONTHS_BETWEEN aligned with Oracle one
 

 Key: HIVE-9518
 URL: https://issues.apache.org/jira/browse/HIVE-9518
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Xiaobing Zhou
Assignee: Alexander Pivovarov
 Attachments: HIVE-9518.1.patch, HIVE-9518.10.patch, 
 HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, 
 HIVE-9518.6.patch, HIVE-9518.7.patch, HIVE-9518.8.patch, HIVE-9518.9.patch


 This is used to track work to build Oracle like months_between. Here's 
 semantics:
 MONTHS_BETWEEN returns number of months between dates date1 and date2. If 
 date1 is later than date2, then the result is positive. If date1 is earlier 
 than date2, then the result is negative. If date1 and date2 are either the 
 same days of the month or both last days of months, then the result is always 
 an integer. Otherwise Oracle Database calculates the fractional portion of 
 the result based on a 31-day month and considers the difference in time 
 components date1 and date2.
 Should accept date, timestamp and string arguments in the format '-MM-dd' 
 or '-MM-dd HH:mm:ss'.
 The result should be rounded to 8 decimal places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10103) LLAP: Cancelling tasks fails to stop cache filling threads

2015-03-31 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-10103:
---

Assignee: Sergey Shelukhin

 LLAP: Cancelling tasks fails to stop cache filling threads
 --

 Key: HIVE-10103
 URL: https://issues.apache.org/jira/browse/HIVE-10103
 Project: Hive
  Issue Type: Sub-task
Reporter: Gopal V
Assignee: Sergey Shelukhin

 Running a bad query (~1Tb scan on a 1Gb cache) and killing the tasks via the 
 container launcher fails to free up the cache filler threads.
 The cache filler threads with no consumers get stuck into a loop 
 {code}
 2015-03-26 14:02:47,335 
 [pool-2-thread-2(container_1_1659_01_74_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map
  1_73_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot 
 evict blocks for 262144 calls; cache full?
 2015-03-26 14:02:48,018 
 [pool-2-thread-7(container_1_1659_01_76_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map
  1_75_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot 
 evict blocks for 262144 calls; cache full?
 2015-03-26 14:02:51,658 
 [pool-2-thread-1(container_1_1659_01_73_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map
  1_72_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot 
 evict blocks for 262144 calls; cache full?
 {code}
 Needs to kill a daemon to get back to normal operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10108) Index#getIndexTableName() returns db.index_table_name

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389343#comment-14389343
 ] 

Hive QA commented on HIVE-10108:




{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708479/HIVE-10108.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3223/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3223/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3223/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3223/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target 
shims/0.23/target shims/aggregator/target shims/common/target 
shims/scheduler/target packaging/target hbase-handler/target testutils/target 
jdbc/target metastore/target itests/target itests/thirdparty 
itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target 
itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-jmh/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target itests/qtest-spark/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
accumulo-handler/target hwi/target common/target common/src/gen 
spark-client/target contrib/target service/target serde/target beeline/target 
odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1670470.

At revision 1670470.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708479 - PreCommit-HIVE-TRUNK-Build

 Index#getIndexTableName() returns db.index_table_name
 -

 Key: HIVE-10108
 URL: https://issues.apache.org/jira/browse/HIVE-10108
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-10108.1-spark.patch, HIVE-10108.1.patch


 Index#getIndexTableName() used to just returns index table name. Now it 
 returns a qualified table name.  This change was introduced in HIVE-3781.
 As a result:
 IMetaStoreClient#getTable(index.getDbName(), index.getIndexTableName())
 throws ObjectNotFoundException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one

2015-03-31 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389381#comment-14389381
 ] 

Mohit Sabharwal commented on HIVE-9518:
---

lgtm, +1 (non-binding)

 Implement MONTHS_BETWEEN aligned with Oracle one
 

 Key: HIVE-9518
 URL: https://issues.apache.org/jira/browse/HIVE-9518
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Xiaobing Zhou
Assignee: Alexander Pivovarov
 Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, 
 HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch, HIVE-9518.7.patch, 
 HIVE-9518.8.patch, HIVE-9518.9.patch


 This is used to track work to build Oracle like months_between. Here's 
 semantics:
 MONTHS_BETWEEN returns number of months between dates date1 and date2. If 
 date1 is later than date2, then the result is positive. If date1 is earlier 
 than date2, then the result is negative. If date1 and date2 are either the 
 same days of the month or both last days of months, then the result is always 
 an integer. Otherwise Oracle Database calculates the fractional portion of 
 the result based on a 31-day month and considers the difference in time 
 components date1 and date2.
 Should accept date, timestamp and string arguments in the format '-MM-dd' 
 or '-MM-dd HH:mm:ss'.
 The result should be rounded to 8 decimal places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10092) LLAP: improve how buffers are locked for split

2015-03-31 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10092:

Attachment: HIVE-10092.patch

Committed to branch... attaching patch here for reference since bugs are 
possible

 LLAP: improve how buffers are locked for split
 --

 Key: HIVE-10092
 URL: https://issues.apache.org/jira/browse/HIVE-10092
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10092.patch


 Right now, for simplicity, entire split of decompressed buffers is locked in 
 cache, in case some buffers are shared between RGs, to avoid dealing with 
 situations where we uncompress some data, pass it on to processor for RG N, 
 then processor processes and unlocks it, and before we can pass it on for RG 
 N+1 it's evicted. 
 However, if split is too big, and cache is small, or many splits are 
 processed at the same time, this can result in a deadlock as entire cache is 
 locked. We need to improve locking to be more granular and probably also try 
 to avoid deadlocks in general (bypass cache?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10092) LLAP: improve how buffers are locked for split

2015-03-31 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-10092.
-
   Resolution: Fixed
Fix Version/s: llap

[~gopalv] fyi

 LLAP: improve how buffers are locked for split
 --

 Key: HIVE-10092
 URL: https://issues.apache.org/jira/browse/HIVE-10092
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10092.patch


 Right now, for simplicity, entire split of decompressed buffers is locked in 
 cache, in case some buffers are shared between RGs, to avoid dealing with 
 situations where we uncompress some data, pass it on to processor for RG N, 
 then processor processes and unlocks it, and before we can pass it on for RG 
 N+1 it's evicted. 
 However, if split is too big, and cache is small, or many splits are 
 processed at the same time, this can result in a deadlock as entire cache is 
 locked. We need to improve locking to be more granular and probably also try 
 to avoid deadlocks in general (bypass cache?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10092) LLAP: improve how buffers are locked for split

2015-03-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389534#comment-14389534
 ] 

Sergey Shelukhin commented on HIVE-10092:
-

Separate jira for cache bypassing is HIVE-10170

 LLAP: improve how buffers are locked for split
 --

 Key: HIVE-10092
 URL: https://issues.apache.org/jira/browse/HIVE-10092
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10092.patch


 Right now, for simplicity, entire split of decompressed buffers is locked in 
 cache, in case some buffers are shared between RGs, to avoid dealing with 
 situations where we uncompress some data, pass it on to processor for RG N, 
 then processor processes and unlocks it, and before we can pass it on for RG 
 N+1 it's evicted. 
 However, if split is too big, and cache is small, or many splits are 
 processed at the same time, this can result in a deadlock as entire cache is 
 locked. We need to improve locking to be more granular and probably also try 
 to avoid deadlocks in general (bypass cache?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access

2015-03-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389617#comment-14389617
 ] 

Hive QA commented on HIVE-10128:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12708481/HIVE-10128.03.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8691 tests executed
*Failed tests:*
{noformat}
TestDummy - did not produce a TEST-*.xml file
TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3225/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3225/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3225/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12708481 - PreCommit-HIVE-TRUNK-Build

 BytesBytesMultiHashMap does not allow concurrent read-only access
 -

 Key: HIVE-10128
 URL: https://issues.apache.org/jira/browse/HIVE-10128
 Project: Hive
  Issue Type: Bug
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, 
 HIVE-10128.03.patch, HIVE-10128.patch, hashmap-after.png, 
 hashmap-sync-source.png, hashmap-sync.png


 The multi-threaded performance takes a serious hit when LLAP shares 
 hashtables between the probe threads running in parallel. 
 !hashmap-sync.png!
 This is an explicit synchronized block inside ReusableRowContainer which 
 triggers this particular pattern.
 !hashmap-sync-source.png!
 Looking deeper into the code, the synchronization seems to be caused due to 
 the fact that WriteBuffers.setReadPoint modifies the otherwise read-only 
 hashtable.
 To generate this sort of result, run LLAP at a WARN log-level, to avoid all 
 the log synchronization that otherwise affects the thread sync.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10111) LLAP: query 7 produces corrupted result with IO enabled

2015-03-31 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-10111:
---

Assignee: Sergey Shelukhin

 LLAP: query 7 produces corrupted result with IO enabled
 ---

 Key: HIVE-10111
 URL: https://issues.apache.org/jira/browse/HIVE-10111
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 Bogus rows appear in the beginning of the result:
 {noformat}
 NULL  97.098.01999664306640.0 84.2991552734
 AAAֺK�G6GDHE�ڗ��G7GDHEAA  9.0 40.619998931884766  
 0.0 34.119998931884766
 AAAEK��d  NULLNULLNULLNULL
 KAEGn@j��d
   KAEGAA  97.06.09904632568   
 313.4899902343755.67076293945
 AAA|EBCA��8��EBCAAA   97.093.66999816894531   
 0.0 2.80942779541
 AAA�g�IOLIy{�KOLIAA   72.0
 51.220001220703125  0.0 1.529713897705
 AAA�+�D�%�GKFC��k��%�IKFCAA   17.0
 121.915258789   0.0 110.91999816894531
 AAA�P��F�KIGE�B��F�KIGEAA15.0
 81.2008447266   124.2300033569336   24.36610351562
 AAAޙ ��i�OIJG�a(��i�OIJGAA33.0
 78.34999847412110.0 68.9400024
 {noformat}.
 Then the correct results follow, shifted by corresponding number of rows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >