[jira] [Updated] (HIVE-10159) HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc
[ https://issues.apache.org/jira/browse/HIVE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10159: --- Attachment: HIVE-10159.1.patch patch #1 HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc - Key: HIVE-10159 URL: https://issues.apache.org/jira/browse/HIVE-10159 Project: Hive Issue Type: Improvement Components: Query Planning Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-10159.1.patch MapJoinDesc and HashTableSinkDesc are derived from JoinDesc HashTableSinkDesc and MapJoinDesc have keyTblDesc field. JoinDesc has keyTableDesc field. I think HashTableSinkDesc and MapJoinDesc can use superclass (JoinDesc) keyTableDesc field instead of defining their own keyTblDesc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9969: - Attachment: HIVE-9969.1-spark.patch Avoid Utilities.getMapRedWork for spark [Spark Branch] -- Key: HIVE-9969 URL: https://issues.apache.org/jira/browse/HIVE-9969 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Minor Attachments: HIVE-9969.1-spark.patch The method shouldn't be used for spark mode. Specifically, map work and reduce work have different plan paths in spark. Calling this method will leave lots of errors in executor's log: {noformat} 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8818) Create unit test where we insert into an encrypted table and then read from it with hcatalog mapreduce
[ https://issues.apache.org/jira/browse/HIVE-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-8818: Attachment: HIVE-8818.patch Create unit test where we insert into an encrypted table and then read from it with hcatalog mapreduce -- Key: HIVE-8818 URL: https://issues.apache.org/jira/browse/HIVE-8818 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Dong Chen Attachments: HIVE-8818.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10160) Give a warning when grouping or ordering by a constant column
[ https://issues.apache.org/jira/browse/HIVE-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388087#comment-14388087 ] Lefty Leverenz commented on HIVE-10160: --- See the thread ORDER BY clause in Hive on u...@hive.apache.org: * [http://mail-archives.apache.org/mod_mbox/hive-user/201503.mbox/%3c05d701d069b4$c71f6e60$555e4b20$@co.uk%3e] Give a warning when grouping or ordering by a constant column - Key: HIVE-10160 URL: https://issues.apache.org/jira/browse/HIVE-10160 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Lefty Leverenz Priority: Minor To avoid confusion, a warning should be issued when users specify column positions instead of names in a GROUP BY or ORDER BY clause (unless hive.groupby.orderby.position.alias is set to true in Hive 0.11.0 or later). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10053) Override new init API fom ReadSupport instead of the deprecated one
[ https://issues.apache.org/jira/browse/HIVE-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388161#comment-14388161 ] Hive QA commented on HIVE-10053: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708309/HIVE-10053.2.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8692 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3214/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3214/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3214/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708309 - PreCommit-HIVE-TRUNK-Build Override new init API fom ReadSupport instead of the deprecated one --- Key: HIVE-10053 URL: https://issues.apache.org/jira/browse/HIVE-10053 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10053.1.patch, HIVE-10053.2.patch, HIVE-10053.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9693) Introduce a stats cache for aggregate stats in HBase metastore [hbase-metastore branch]
[ https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-9693: --- Summary: Introduce a stats cache for aggregate stats in HBase metastore [hbase-metastore branch] (was: Introduce a stats cache for HBase metastore [hbase-metastore branch]) Introduce a stats cache for aggregate stats in HBase metastore [hbase-metastore branch] Key: HIVE-9693 URL: https://issues.apache.org/jira/browse/HIVE-9693 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-9693.1.patch, HIVE-9693.2.patch, HIVE-9693.3.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388224#comment-14388224 ] Hive QA commented on HIVE-9969: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708350/HIVE-9969.1-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8710 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/816/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/816/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-816/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708350 - PreCommit-HIVE-SPARK-Build Avoid Utilities.getMapRedWork for spark [Spark Branch] -- Key: HIVE-9969 URL: https://issues.apache.org/jira/browse/HIVE-9969 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Minor Attachments: HIVE-9969.1-spark.patch The method shouldn't be used for spark mode. Specifically, map work and reduce work have different plan paths in spark. Calling this method will leave lots of errors in executor's log: {noformat} 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10163: --- Attachment: mergejoin-parallel-lock.png mergejoin-parallel-bt.png CommonMergeJoinOperator calls WritableComparator.get() in the inner loop Key: HIVE-10163 URL: https://issues.apache.org/jira/browse/HIVE-10163 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Gopal V Labels: JOIN, Performance Attachments: mergejoin-comparekeys.png, mergejoin-parallel-bt.png, mergejoin-parallel-lock.png The CommonMergeJoinOperator wastes CPU looking up the correct comparator for each WritableComparable in each row. {code} @SuppressWarnings(rawtypes) private int compareKeys(ListObject k1, ListObject k2) { int ret = 0; ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2); if (ret != 0) { return ret; } } {code} !mergejoin-comparekeys.png! The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where it tries to use reflection to set the Comparator config for each row being compared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10163: --- Description: The CommonMergeJoinOperator wastes CPU looking up the correct comparator for each WritableComparable in each row. {code} @SuppressWarnings(rawtypes) private int compareKeys(ListObject k1, ListObject k2) { int ret = 0; ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2); if (ret != 0) { return ret; } } {code} !mergejoin-comparekeys.png! The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where it tries to use reflection to set the Comparator config for each row being compared. was: The CommonMergeJoinOperator wastes CPU looking up the correct comparator for each WritableComparable in each row. {code} @SuppressWarnings(rawtypes) private int compareKeys(ListObject k1, ListObject k2) { int ret = 0; ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2); if (ret != 0) { return ret; } } {code} The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where it tries to use reflection to set the Comparator config for each row being compared. CommonMergeJoinOperator calls WritableComparator.get() in the inner loop Key: HIVE-10163 URL: https://issues.apache.org/jira/browse/HIVE-10163 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Gopal V Labels: JOIN, Performance Attachments: mergejoin-comparekeys.png The CommonMergeJoinOperator wastes CPU looking up the correct comparator for each WritableComparable in each row. {code} @SuppressWarnings(rawtypes) private int compareKeys(ListObject k1, ListObject k2) { int ret = 0; ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2); if (ret != 0) { return ret; } } {code} !mergejoin-comparekeys.png! The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where it tries to use reflection to set the Comparator config for each row being compared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10163: --- Attachment: mergejoin-comparekeys.png CommonMergeJoinOperator calls WritableComparator.get() in the inner loop Key: HIVE-10163 URL: https://issues.apache.org/jira/browse/HIVE-10163 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Gopal V Labels: JOIN, Performance Attachments: mergejoin-comparekeys.png The CommonMergeJoinOperator wastes CPU looking up the correct comparator for each WritableComparable in each row. {code} @SuppressWarnings(rawtypes) private int compareKeys(ListObject k1, ListObject k2) { int ret = 0; ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2); if (ret != 0) { return ret; } } {code} The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where it tries to use reflection to set the Comparator config for each row being compared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388254#comment-14388254 ] Gopal V commented on HIVE-10163: Nope, it still hits HADOOP-11771 in the processKey(). CommonMergeJoinOperator calls WritableComparator.get() in the inner loop Key: HIVE-10163 URL: https://issues.apache.org/jira/browse/HIVE-10163 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Gopal V Labels: JOIN, Performance Attachments: mergejoin-comparekeys.png, mergejoin-parallel-bt.png, mergejoin-parallel-lock.png The CommonMergeJoinOperator wastes CPU looking up the correct comparator for each WritableComparable in each row. {code} @SuppressWarnings(rawtypes) private int compareKeys(ListObject k1, ListObject k2) { int ret = 0; ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2); if (ret != 0) { return ret; } } {code} !mergejoin-parallel-lock.png! !mergejoin-comparekeys.png! The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where it tries to use reflection to set the Comparator config for each row being compared. !mergejoin-parallel-bt.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9845) HCatSplit repeats information making input split data size huge
[ https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-9845: --- Attachment: (was: HIVE-9845.2.patch) HCatSplit repeats information making input split data size huge --- Key: HIVE-9845 URL: https://issues.apache.org/jira/browse/HIVE-9845 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Rohini Palaniswamy Assignee: Mithun Radhakrishnan Attachments: HIVE-9845.1.patch Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which has even triple the number of splits(100K+ splits and tasks) does not hit that issue. {code} HCatBaseInputFormat.java: //Call getSplit on the InputFormat, create an //HCatSplit for each underlying split //NumSplits is 0 for our purposes org.apache.hadoop.mapred.InputSplit[] baseSplits = inputFormat.getSplits(jobConf, 0); for(org.apache.hadoop.mapred.InputSplit split : baseSplits) { splits.add(new HCatSplit( partitionInfo, split,allCols)); } {code} Each hcatSplit duplicates partition schema and table schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9845) HCatSplit repeats information making input split data size huge
[ https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-9845: --- Attachment: HIVE-9845.3.patch Another take on the first patch. Except, with more logging, and a correction to {{TestHCatOutputFormat}}. HCatSplit repeats information making input split data size huge --- Key: HIVE-9845 URL: https://issues.apache.org/jira/browse/HIVE-9845 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Rohini Palaniswamy Assignee: Mithun Radhakrishnan Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which has even triple the number of splits(100K+ splits and tasks) does not hit that issue. {code} HCatBaseInputFormat.java: //Call getSplit on the InputFormat, create an //HCatSplit for each underlying split //NumSplits is 0 for our purposes org.apache.hadoop.mapred.InputSplit[] baseSplits = inputFormat.getSplits(jobConf, 0); for(org.apache.hadoop.mapred.InputSplit split : baseSplits) { splits.add(new HCatSplit( partitionInfo, split,allCols)); } {code} Each hcatSplit duplicates partition schema and table schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9693) Introduce a stats cache for HBase metastore [hbase-metastore branch]
[ https://issues.apache.org/jira/browse/HIVE-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-9693: --- Attachment: HIVE-9693.3.patch Introduce a stats cache for HBase metastore [hbase-metastore branch] - Key: HIVE-9693 URL: https://issues.apache.org/jira/browse/HIVE-9693 Project: Hive Issue Type: Sub-task Components: Metastore Reporter: Vaibhav Gumashta Assignee: Vaibhav Gumashta Attachments: HIVE-9693.1.patch, HIVE-9693.2.patch, HIVE-9693.3.patch NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10162) LLAP: Avoid deserializing the plan 1 times in a single thread
[ https://issues.apache.org/jira/browse/HIVE-10162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10162: --- Attachment: deserialize-plan-2.png deserialize-plan-1.png LLAP: Avoid deserializing the plan 1 times in a single thread --- Key: HIVE-10162 URL: https://issues.apache.org/jira/browse/HIVE-10162 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Gunther Hagleitner Fix For: llap Attachments: deserialize-plan-1.png, deserialize-plan-2.png Kryo shows up in the critical hot-path for LLAP when using a plan with a very large filter condition, due to the fact that the plan is deserialized more than once for each task. !deserialize-plan-1.png! !deserialize-plan-2.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10163) CommonMergeJoinOperator calls WritableComparator.get() in the inner loop
[ https://issues.apache.org/jira/browse/HIVE-10163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388188#comment-14388188 ] Gopal V commented on HIVE-10163: {{WritableComparator::get()}} -- {{ReflectionUtils.setConf()}} is pointless, but it misses the static synchronized block inside HADOOP-11771 because the default argument is NULL. There is no difference between each iteration of the same row-keys, since the schema does not vary between JOIN keys in the same operator. CommonMergeJoinOperator calls WritableComparator.get() in the inner loop Key: HIVE-10163 URL: https://issues.apache.org/jira/browse/HIVE-10163 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.2.0 Reporter: Gopal V Labels: JOIN, Performance The CommonMergeJoinOperator wastes CPU looking up the correct comparator for each WritableComparable in each row. {code} @SuppressWarnings(rawtypes) private int compareKeys(ListObject k1, ListObject k2) { int ret = 0; ret = WritableComparator.get(key_1.getClass()).compare(key_1, key_2); if (ret != 0) { return ret; } } {code} The slow part of that get() is deep within {{ReflectionUtils.setConf}}, where it tries to use reflection to set the Comparator config for each row being compared. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10164) LLAP: ORC BIGINT SARGs regressed after Parquet PPD fixes (HIVE-8122)
[ https://issues.apache.org/jira/browse/HIVE-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10164: --- Attachment: orc-sarg-tostring.png LLAP: ORC BIGINT SARGs regressed after Parquet PPD fixes (HIVE-8122) Key: HIVE-10164 URL: https://issues.apache.org/jira/browse/HIVE-10164 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Prasanth Jayachandran Attachments: orc-sarg-tostring.png HIVE-8122 seems to have introduced a toString() to the ORC PPD codepath for BIGINT. https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java#L162 {code} private ListObject getOrcLiteralList() { // no need to cast ... ListObject result = new ArrayListObject(); for (Object o : literalList) { result.add(Long.valueOf(o.toString())); } return result; } {code} !orc-sarg-tostring.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10001) SMB join in reduce side
[ https://issues.apache.org/jira/browse/HIVE-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388382#comment-14388382 ] Hive QA commented on HIVE-10001: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708327/HIVE-10001.7.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8692 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery org.apache.hive.spark.client.TestSparkClient.testJobSubmission {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3217/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3217/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3217/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708327 - PreCommit-HIVE-TRUNK-Build SMB join in reduce side --- Key: HIVE-10001 URL: https://issues.apache.org/jira/browse/HIVE-10001 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-10001.1.patch, HIVE-10001.2.patch, HIVE-10001.3.patch, HIVE-10001.4.patch, HIVE-10001.5.patch, HIVE-10001.6.patch, HIVE-10001.7.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-3378) UDF to obtain the numeric day of an year from date or timestamp in HIVE.
[ https://issues.apache.org/jira/browse/HIVE-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov reassigned HIVE-3378: - Assignee: Alexander Pivovarov UDF to obtain the numeric day of an year from date or timestamp in HIVE. Key: HIVE-3378 URL: https://issues.apache.org/jira/browse/HIVE-3378 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.8.1, 0.9.0 Reporter: Deepti Antony Assignee: Alexander Pivovarov Attachments: HIVE-3378.1.patch.txt Hive current releases lacks a function which returns the numeric day of an year if a date or timestamp is given .The function DAYOFYEAR(date) would return the numeric day from a date / timestamp or which would be useful while using HiveQL.DAYOFYEAR can be used to compare data with respect to number of days till the given date.It can be used in different domains. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-10044) Allow interval params for year/month/day/hour/minute/second functions
[ https://issues.apache.org/jira/browse/HIVE-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389860#comment-14389860 ] Thejas M Nair edited comment on HIVE-10044 at 4/1/15 1:52 AM: -- [~jdere] Can you also please update the function descriptions (@ Description annotation) for these? Describe function tests would also need to be updated for them after that change(assuming they exist). was (Author: thejas): [~jdere] Can you also please update the function descriptions (@ Description annotation) for these? Allow interval params for year/month/day/hour/minute/second functions - Key: HIVE-10044 URL: https://issues.apache.org/jira/browse/HIVE-10044 Project: Hive Issue Type: Sub-task Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10044.1.patch Update the year/month/day/hour/minute/second functions to retrieve the various fields of the year-month and day-time interval types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10172) Fix performance regression caused by HIVE-8122 for ORC
[ https://issues.apache.org/jira/browse/HIVE-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389900#comment-14389900 ] Ferdinand Xu commented on HIVE-10172: - Thank you for your update. LGTM for the new patch. Fix performance regression caused by HIVE-8122 for ORC -- Key: HIVE-10172 URL: https://issues.apache.org/jira/browse/HIVE-10172 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10172.1.patch, HIVE-10172.2.patch See HIVE-10164 for description. We should fix this in trunk and move it to branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10175) PartitionPruning lacks a fast-path exit for large IN() queries
[ https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10175: --- Component/s: Query Planning PartitionPruning lacks a fast-path exit for large IN() queries -- Key: HIVE-10175 URL: https://issues.apache.org/jira/browse/HIVE-10175 Project: Hive Issue Type: Bug Components: Query Planning, Tez Affects Versions: 1.2.0 Reporter: Gopal V Priority: Minor TezCompiler::runDynamicPartitionPruning() calls the graph walker even if all tables provided to the optimizer are unpartitioned temporary tables. This makes it extremely slow as it will walk inspect a large/complex FilterOperator later in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10175) PartitionPruning lacks a fast-path exit for large IN() queries
[ https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10175: --- Summary: PartitionPruning lacks a fast-path exit for large IN() queries (was: Tez DynamicPartitionPruning lacks a fast-path exit for large IN() queries) PartitionPruning lacks a fast-path exit for large IN() queries -- Key: HIVE-10175 URL: https://issues.apache.org/jira/browse/HIVE-10175 Project: Hive Issue Type: Bug Components: Query Planning, Tez Affects Versions: 1.2.0 Reporter: Gopal V Priority: Minor TezCompiler::runDynamicPartitionPruning() calls the graph walker even if all tables provided to the optimizer are unpartitioned temporary tables. This makes it extremely slow as it will walk inspect a large/complex FilterOperator later in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository
[ https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1439#comment-1439 ] Lefty Leverenz commented on HIVE-9664: -- Doc note: The ADD FILE | JAR | ARCHIVE commands are documented in several places, so this information needs to be added to all of them or perhaps just to Hive Resources in the CLI doc with links from the others. * [Commands | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Commands] * [HiveServer2 Clients -- Beeline Hive Commands | https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineHiveCommands] * [CLI -- Hive Interactive Shell Commands | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands] * [CLI -- Hive Resources | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources] Hive add jar command should be able to download and add jars from a repository Key: HIVE-9664 URL: https://issues.apache.org/jira/browse/HIVE-9664 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Anant Nag Assignee: Anant Nag Labels: TODOC1.2, hive, patch Fix For: 1.2.0 Attachments: HIVE-9664.4.patch, HIVE-9664.5.patch, HIVE-9664.patch, HIVE-9664.patch, HIVE-9664.patch Currently Hive's add jar command takes a local path to the dependency jar. This clutters the local file-system as users may forget to remove this jar later It would be nice if Hive supported a Gradle like notation to download the jar from a repository. Example: add jar org:module:version It should also be backward compatible and should take jar from the local file-system as well. RB: https://reviews.apache.org/r/31628/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10174) LLAP: ORC MemoryManager is singleton synchronized
[ https://issues.apache.org/jira/browse/HIVE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10174: --- Description: ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded performance. !orc-memorymanager-1.png! !orc-memorymanager-2.png! was: ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded performance. !orc-memory-manager-1.png! !orc-memory-manager-2.png! LLAP: ORC MemoryManager is singleton synchronized - Key: HIVE-10174 URL: https://issues.apache.org/jira/browse/HIVE-10174 Project: Hive Issue Type: Sub-task Components: File Formats Affects Versions: llap Reporter: Gopal V Attachments: orc-memorymanager-1.png, orc-memorymanager-2.png ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded performance. !orc-memorymanager-1.png! !orc-memorymanager-2.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10174) LLAP: ORC MemoryManager is singleton synchronized
[ https://issues.apache.org/jira/browse/HIVE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10174: --- Attachment: orc-memorymanager-2.png orc-memorymanager-1.png LLAP: ORC MemoryManager is singleton synchronized - Key: HIVE-10174 URL: https://issues.apache.org/jira/browse/HIVE-10174 Project: Hive Issue Type: Sub-task Components: File Formats Affects Versions: llap Reporter: Gopal V Attachments: orc-memorymanager-1.png, orc-memorymanager-2.png ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded performance. !orc-memory-manager-1.png! !orc-memory-manager-2.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10174) LLAP: ORC MemoryManager is singleton synchronized
[ https://issues.apache.org/jira/browse/HIVE-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389958#comment-14389958 ] Gopal V commented on HIVE-10174: Performance difference is somewhere along the lines of 34s with the MemoryManager + addRow synchronized blocks vs 9s without the MemoryManager and the addRow synchronized(this). To be looked at when we're writing ORC out of LLAP. LLAP: ORC MemoryManager is singleton synchronized - Key: HIVE-10174 URL: https://issues.apache.org/jira/browse/HIVE-10174 Project: Hive Issue Type: Sub-task Components: File Formats Affects Versions: llap Reporter: Gopal V Attachments: orc-memorymanager-1.png, orc-memorymanager-2.png ORC MemoryManager::addedRow() checks are bad for LLAP multi-threaded performance. !orc-memorymanager-1.png! !orc-memorymanager-2.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9664) Hive add jar command should be able to download and add jars from a repository
[ https://issues.apache.org/jira/browse/HIVE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-9664: - Labels: TODOC1.2 hive patch (was: hive patch) Hive add jar command should be able to download and add jars from a repository Key: HIVE-9664 URL: https://issues.apache.org/jira/browse/HIVE-9664 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Anant Nag Assignee: Anant Nag Labels: TODOC1.2, hive, patch Fix For: 1.2.0 Attachments: HIVE-9664.4.patch, HIVE-9664.5.patch, HIVE-9664.patch, HIVE-9664.patch, HIVE-9664.patch Currently Hive's add jar command takes a local path to the dependency jar. This clutters the local file-system as users may forget to remove this jar later It would be nice if Hive supported a Gradle like notation to download the jar from a repository. Example: add jar org:module:version It should also be backward compatible and should take jar from the local file-system as well. RB: https://reviews.apache.org/r/31628/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10164) LLAP: ORC BIGINT SARGs regressed after Parquet PPD fixes (HIVE-8122)
[ https://issues.apache.org/jira/browse/HIVE-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-10164. -- Resolution: Invalid Will be fixed in HIVE-10172 LLAP: ORC BIGINT SARGs regressed after Parquet PPD fixes (HIVE-8122) Key: HIVE-10164 URL: https://issues.apache.org/jira/browse/HIVE-10164 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Prasanth Jayachandran Attachments: orc-sarg-tostring.png HIVE-8122 seems to have introduced a toString() to the ORC PPD codepath for BIGINT. https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java#L162 {code} private ListObject getOrcLiteralList() { // no need to cast ... ListObject result = new ArrayListObject(); for (Object o : literalList) { result.add(Long.valueOf(o.toString())); } return result; } {code} !orc-sarg-tostring.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10175) PartitionPruning lacks a fast-path exit for large IN() queries
[ https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389994#comment-14389994 ] Gopal V commented on HIVE-10175: {code} METHOD DURATION(ms) parse 462 semanticAnalyze 9,312 TezBuildDag569 TezSubmitToRunningDag5 TotalPrepTime 11,343 {code} Save 2 seconds by doing {code} set hive.tez.dynamic.partition.pruning=false; METHOD DURATION(ms) parse 449 semanticAnalyze 7,254 TezBuildDag527 TezSubmitToRunningDag 16 TotalPrepTime9,190 {code} save 9 seconds off default planning with {code} set hive.optimize.ppd=false; set hive.tez.dynamic.partition.pruning=false; METHOD DURATION(ms) parse 446 semanticAnalyze 2,089 TezBuildDag578 TezSubmitToRunningDag4 TotalPrepTime4,249 {code} PartitionPruning lacks a fast-path exit for large IN() queries -- Key: HIVE-10175 URL: https://issues.apache.org/jira/browse/HIVE-10175 Project: Hive Issue Type: Bug Components: Physical Optimizer, Tez Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gunther Hagleitner Priority: Minor TezCompiler::runDynamicPartitionPruning() ppr.PartitionPruner() calls the graph walker even if all tables provided to the optimizer are unpartitioned (or temporary) tables. This makes it extremely slow as it will walk inspect a large/complex FilterOperator later in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10103) LLAP: Cancelling tasks fails to stop cache filling threads
[ https://issues.apache.org/jira/browse/HIVE-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389832#comment-14389832 ] Sergey Shelukhin commented on HIVE-10103: - Main problem is that tez doesn't close the reader when operators fail... trying to see now how to change that. Added some workarounds for when nextCvb is interrupted for now LLAP: Cancelling tasks fails to stop cache filling threads -- Key: HIVE-10103 URL: https://issues.apache.org/jira/browse/HIVE-10103 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Sergey Shelukhin Running a bad query (~1Tb scan on a 1Gb cache) and killing the tasks via the container launcher fails to free up the cache filler threads. The cache filler threads with no consumers get stuck into a loop {code} 2015-03-26 14:02:47,335 [pool-2-thread-2(container_1_1659_01_74_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map 1_73_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot evict blocks for 262144 calls; cache full? 2015-03-26 14:02:48,018 [pool-2-thread-7(container_1_1659_01_76_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map 1_75_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot evict blocks for 262144 calls; cache full? 2015-03-26 14:02:51,658 [pool-2-thread-1(container_1_1659_01_73_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map 1_72_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot evict blocks for 262144 calls; cache full? {code} Needs to kill a daemon to get back to normal operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10148) update of bucking column should not be allowed
[ https://issues.apache.org/jira/browse/HIVE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389925#comment-14389925 ] Hive QA commented on HIVE-10148: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708493/HIVE-10148.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 8692 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_all_types org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_tmp_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_update_noupdatepriv org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation.testUpdateSomeColumnsUsed org.apache.hadoop.hive.ql.security.authorization.plugin.TestHiveAuthorizerCheckInvocation.testUpdateSomeColumnsUsedExprInSet org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.dynamicPartitioningDelete {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3228/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3228/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3228/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708493 - PreCommit-HIVE-TRUNK-Build update of bucking column should not be allowed -- Key: HIVE-10148 URL: https://issues.apache.org/jira/browse/HIVE-10148 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.1.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10148.patch update tbl set a = 5; should raise an error if 'a' is a bucketing column. Such operation is not supported but currently not checked for. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-10134: Attachment: HIVE-10134.2-spark.patch Updated golden files for MR. Test failure on nonmr_fetch is strange. I couldn't reproduce the error on my local machine. Fix test failures after HIVE-10130 [Spark Branch] - Key: HIVE-10134 URL: https://issues.apache.org/jira/browse/HIVE-10134 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Chao Fix For: spark-branch Attachments: HIVE-10134.1-spark.patch, HIVE-10134.2-spark.patch Complete test run: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10152) ErrorMsg.formatToErrorMsgMap has bad regex
[ https://issues.apache.org/jira/browse/HIVE-10152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389981#comment-14389981 ] Hive QA commented on HIVE-10152: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708501/HIVE-10152.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8671 tests executed *Failed tests:* {noformat} TestHiveAuthorizationTaskFactory - did not produce a TEST-*.xml file TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3229/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3229/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3229/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708501 - PreCommit-HIVE-TRUNK-Build ErrorMsg.formatToErrorMsgMap has bad regex -- Key: HIVE-10152 URL: https://issues.apache.org/jira/browse/HIVE-10152 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 1.1.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10152.patch {noformat} String pattern = errorMsg.mesg.replaceAll(\\{.*\\}, .*); {noformat} should be {noformat} String pattern = errorMsg.mesg.replaceAll(\\{[0-9]+\\}, .*); {noformat} current regex can match the whole msg (with more than 1 param) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10175) PartitionPruning lacks a fast-path exit for large IN() queries
[ https://issues.apache.org/jira/browse/HIVE-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-10175: --- Assignee: Gunther Hagleitner PartitionPruning lacks a fast-path exit for large IN() queries -- Key: HIVE-10175 URL: https://issues.apache.org/jira/browse/HIVE-10175 Project: Hive Issue Type: Bug Components: Physical Optimizer, Tez Affects Versions: 1.2.0 Reporter: Gopal V Assignee: Gunther Hagleitner Priority: Minor TezCompiler::runDynamicPartitionPruning() ppr.PartitionPruner() calls the graph walker even if all tables provided to the optimizer are unpartitioned (or temporary) tables. This makes it extremely slow as it will walk inspect a large/complex FilterOperator later in the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389995#comment-14389995 ] Rui Li commented on HIVE-9969: -- Committed to spark. Thanks Xuefu. Avoid Utilities.getMapRedWork for spark [Spark Branch] -- Key: HIVE-9969 URL: https://issues.apache.org/jira/browse/HIVE-9969 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Fix For: spark-branch Attachments: HIVE-9969.1-spark.patch The method shouldn't be used for spark mode. Specifically, map work and reduce work have different plan paths in spark. Calling this method will leave lots of errors in executor's log: {noformat} 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-9969: - Release Note: (was: Committed to spark. Thanks Xuefu.) Avoid Utilities.getMapRedWork for spark [Spark Branch] -- Key: HIVE-9969 URL: https://issues.apache.org/jira/browse/HIVE-9969 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Minor Fix For: spark-branch Attachments: HIVE-9969.1-spark.patch The method shouldn't be used for spark mode. Specifically, map work and reduce work have different plan paths in spark. Calling this method will leave lots of errors in executor's log: {noformat} 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10096) Investigate the random failure of TestCliDriver.testCliDriver_udaf_percentile_approx_23
[ https://issues.apache.org/jira/browse/HIVE-10096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov resolved HIVE-10096. Resolution: Duplicate dup of HIVE-10059 Investigate the random failure of TestCliDriver.testCliDriver_udaf_percentile_approx_23 --- Key: HIVE-10096 URL: https://issues.apache.org/jira/browse/HIVE-10096 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.0 Reporter: Aihua Xu Assignee: Aihua Xu Priority: Minor The unit test sometimes seems to fail with the following problem: Running: diff -a /home/hiveptest/54.158.232.92-hiveptest-2/apache-svn-trunk-source/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/udaf_percentile_approx_23.q.out /home/hiveptest/54.158.232.92-hiveptest-2/apache-svn-trunk-source/itests/qtest/../../ql/src/test/results/clientpositive/udaf_percentile_approx_23.q.out 628c628 256.0 --- 255.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10081) LLAP: Make the low-level IO threadpool configurable
[ https://issues.apache.org/jira/browse/HIVE-10081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-10081. -- Resolution: Fixed Committed to llap branch LLAP: Make the low-level IO threadpool configurable --- Key: HIVE-10081 URL: https://issues.apache.org/jira/browse/HIVE-10081 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Prasanth Jayachandran Attachments: HIVE-10081.1.patch The LLAP low level reader thread-pool is hard-limited to 10-threads, which is not sufficient to max out the network bandwidth on a 10GigE network. These threads are often seen in IOWAIT, since they are reading remote data. A dumb fix for my 12-core instance was to use a higher thread-pool count for the IO read-ahead. {code} diff --git a/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java b/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java index 3f9ddfb..b7cd177 100644 --- a/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java +++ b/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java @@ -105,7 +105,7 @@ private LlapIoImpl(Configuration conf) throws IOException { cachePolicy.setEvictionListener(metadataCache); } // Arbitrary thread pool. Listening is used for unhandled errors for now (TODO: remove?) -executor = MoreExecutors.listeningDecorator(Executors.newFixedThreadPool(10)); +executor = MoreExecutors.listeningDecorator(Executors.newFixedThreadPool(24)); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10172) Fix performance regression caused by HIVE-8122 for ORC
[ https://issues.apache.org/jira/browse/HIVE-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10172: - Attachment: HIVE-10172.2.patch Removed the explicit boxing. [~Ferd] Its a performance regression. There is no issue with functionality. Fix performance regression caused by HIVE-8122 for ORC -- Key: HIVE-10172 URL: https://issues.apache.org/jira/browse/HIVE-10172 Project: Hive Issue Type: Bug Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10172.1.patch, HIVE-10172.2.patch See HIVE-10164 for description. We should fix this in trunk and move it to branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10168) make groupby3_map.q more stable
[ https://issues.apache.org/jira/browse/HIVE-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10168: --- Issue Type: Improvement (was: Bug) make groupby3_map.q more stable --- Key: HIVE-10168 URL: https://issues.apache.org/jira/browse/HIVE-10168 Project: Hive Issue Type: Improvement Components: Tests Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10168.1.patch The test run aggregation query which produces several DOUBLE numbers. Assertion framework compares output containing DOUBLE numbers without any delta. As a result test is not stable e.g. build 3219 failed with the following test result {code} groupby3_map.q.out 139c139 130091.0260.182 256.10355987055016 98.00.0 142.92680950752379 143.06995106518903 20428.0728759 20469.010897795582 --- 130091.0260.182 256.10355987055016 98.00.0 142.9268095075238 143.06995106518906 20428.072876 20469.01089779559 {code} http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3219/testReport/junit/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_groupby3_map/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10150) delete from acidTbl where a in(select a from nonAcidOrcTbl) fails
[ https://issues.apache.org/jira/browse/HIVE-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10150: -- Attachment: HIVE-10050.patch delete from acidTbl where a in(select a from nonAcidOrcTbl) fails - Key: HIVE-10150 URL: https://issues.apache.org/jira/browse/HIVE-10150 Project: Hive Issue Type: Bug Components: Query Planning, Transactions Affects Versions: 1.1.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10050.patch this query raises an error 10297,FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table nonAcidOrcTbl that does not use an AcidOutputFormat or is not bucketed even though nonAcidOrcTbl is only being read, not written. select b from + Table.ACIDTBL + where a in (select b from + Table.NONACIDORCTBL + ) runs fine. There doesn't seem to be any logical reason why we should rise the error here. Same for 'update' statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10044) Allow interval params for year/month/day/hour/minute/second functions
[ https://issues.apache.org/jira/browse/HIVE-10044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389860#comment-14389860 ] Thejas M Nair commented on HIVE-10044: -- [~jdere] Can you also please update the function descriptions (@ Description annotation) for these? Allow interval params for year/month/day/hour/minute/second functions - Key: HIVE-10044 URL: https://issues.apache.org/jira/browse/HIVE-10044 Project: Hive Issue Type: Sub-task Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-10044.1.patch Update the year/month/day/hour/minute/second functions to retrieve the various fields of the year-month and day-time interval types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-10161: -- Assignee: Gopal V (was: Prasanth Jayachandran) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug) Key: HIVE-10161 URL: https://issues.apache.org/jira/browse/HIVE-10161 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Gopal V Fix For: llap The EncodedReaderImpl will die when reading from the cache, when reading data written by the regular ORC writer {code} Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780) at org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null] {code} Turning off hive.llap.io.enabled makes the error go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10168) make groupby3_map.q more stable
[ https://issues.apache.org/jira/browse/HIVE-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10168: --- Attachment: HIVE-10168.1.patch patch #1 make groupby3_map.q more stable --- Key: HIVE-10168 URL: https://issues.apache.org/jira/browse/HIVE-10168 Project: Hive Issue Type: Bug Components: Tests Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-10168.1.patch The test run aggregation query which produces several DOUBLE numbers. Assertion framework compares output containing DOUBLE numbers without any delta. As a result test is not stable e.g. build 3219 failed with the following test result {code} groupby3_map.q.out 139c139 130091.0260.182 256.10355987055016 98.00.0 142.92680950752379 143.06995106518903 20428.0728759 20469.010897795582 --- 130091.0260.182 256.10355987055016 98.00.0 142.9268095075238 143.06995106518906 20428.072876 20469.01089779559 {code} http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3219/testReport/junit/org.apache.hadoop.hive.cli/TestCliDriver/testCliDriver_groupby3_map/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10159) HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc
[ https://issues.apache.org/jira/browse/HIVE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388661#comment-14388661 ] Hive QA commented on HIVE-10159: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708344/HIVE-10159.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8692 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_decimal {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3218/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3218/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3218/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708344 - PreCommit-HIVE-TRUNK-Build HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc - Key: HIVE-10159 URL: https://issues.apache.org/jira/browse/HIVE-10159 Project: Hive Issue Type: Improvement Components: Query Planning Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-10159.1.patch MapJoinDesc and HashTableSinkDesc are derived from JoinDesc HashTableSinkDesc and MapJoinDesc have keyTblDesc field. JoinDesc has keyTableDesc field. I think HashTableSinkDesc and MapJoinDesc can use superclass (JoinDesc) keyTableDesc field instead of defining their own keyTblDesc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-10165: --- Description: h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. h3. Implementation We've patched the API to provide visibility to the underlying {{OrcRecordUpdater}} and allow extension of the {{AbstractRecordWriter}} by third-parties outside of the package. We've also updated the user facing interfaces to provide update and delete functionality. I've provided the modifications as three incremental patches. Generally speaking, each patch makes the API less backwards compatible but more consistent with respect to offering updates, deletes as well as writes (inserts). Ideally I hope that all three patches have merit, but only the first patch is absolutely necessary to enable the features we need on the API, and it does so in a backwards compatible way. I'll summarise the contents of each patch: h4. [^HIVE-10165.0.patch] - Required This patch contains what we consider to be the minimum amount of changes required to allow users to create {{RecordWriter}} subclasses that can insert, update, and delete records. These changes also maintain backwards compatibility at the expense of confusing the API a little. Note that the row representation has be changed from {{byte[]}} to {{Object}}. Within our data processing jobs our records are often available in a strongly typed and decoded form such as a POJO or a Tuple object. Therefore is seems to make sense that we are able to pass this through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. This of course still allows users to use {{byte[]}} if they wish. h4. [^HIVE-10165.1.patch] - Nice to have This patch builds on the changes made in the *required* patch and aims to make the API cleaner and more consistent while accommodating updates and inserts. It also adds some logic to prevent the user from submitting multiple operation types to a single {{TransactionBatch}} as we found this creates data inconsistencies within the Hive table. This patch breaks backwards compatibility. h4. [^HIVE-10165.2.patch] - Nomenclature This final patch simply renames some of existing types to more accurately convey their increased responsibilities. The API is no longer writing just new records, it is now also responsible for writing operations that are applied to existing records. This patch breaks backwards compatibility. h3. Example I've attached simple typical usage of the API. This is not a patch and is intended as an illustration only: [^ReflectiveOperationWriter.java] h3. Known issues I have not yet provided any unit tests for the extended functionality. I fully expect that tests are required and will work on these if my patches have merit. was: h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed
[jira] [Assigned] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao reassigned HIVE-10134: --- Assignee: Chao Fix test failures after HIVE-10130 [Spark Branch] - Key: HIVE-10134 URL: https://issues.apache.org/jira/browse/HIVE-10134 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Chao Complete test run: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-10165: --- Description: h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. h3. Implementation We've patched the API to provide visibility to the underlying {{OrcRecordUpdater}} and allow extension of the {{AbstractRecordWriter}} by third-parties outside of the package. We've also updated the user facing interfaces to provide update and delete functionality. I've provided the modifications as three incremental patches. Generally speaking, each patch makes the API less backwards compatible but more consistent with respect to offering updates, deletes as well as writes (inserts). Ideally I hope that all three patches have merit, but only the first patch is absolutely necessary to enable the features we need on the API, and it does so in a backwards compatible way. I'll summarise the contents of each patch: h4. [^HIVE-10165.0.patch] - Required This patch contains what we consider to be the minimum amount of changes required to allow users to create {{RecordWriter}} subclasses that can insert, update, and delete records. These changes also maintain backwards compatibility at the expense of confusing the API a little. Note that the row representation has be changed from {{byte[]}} to {{Object}}. Within our data processing jobs our records are often available in a strongly typed and decoded form such as a POJO or a Tuple object. Therefore is seems to make sense that we are able to pass this through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. This of course still allows users to use {{byte[]}} if they wish. h4. [^HIVE-10165.1.patch] - Nice to have This patch builds on the changes made in the *required* patch and aims to make the API cleaner and more consistent while accommodating updates and inserts. It also adds some logic to prevent the user from submitting multiple operation types to a single {{TransactionBatch}} as we found this creates data inconsistencies within the Hive table. This patch breaks backwards compatibility. h4. [^HIVE-10165.2.patch] - Nomenclature This final patch simply renames some of existing types to more accurately convey their increased responsibilities. The API is no longer writing just new records, it is now also responsible for writing operations that are applied to existing records. This patch breaks backwards compatibility. h3. Example I've attached simple typical usage of the API. This is not a patch and is intended as an illustration only: [^ReflectiveOperationWriter.java] h3. Known issues I have not yet provided any unit tests for the extended functionality. I fully expect that these are required and will work on these if these patches have merit. was: h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge
[jira] [Commented] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390014#comment-14390014 ] Hive QA commented on HIVE-10134: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708601/HIVE-10134.2-spark.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8710 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/818/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/818/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-818/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708601 - PreCommit-HIVE-SPARK-Build Fix test failures after HIVE-10130 [Spark Branch] - Key: HIVE-10134 URL: https://issues.apache.org/jira/browse/HIVE-10134 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Chao Fix For: spark-branch Attachments: HIVE-10134.1-spark.patch, HIVE-10134.2-spark.patch Complete test run: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob
[ https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-10050: -- Labels: TODOC1.2 (was: ) Support overriding memory configuration for AM launched for TempletonControllerJob -- Key: HIVE-10050 URL: https://issues.apache.org/jira/browse/HIVE-10050 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Hitesh Shah Assignee: Hitesh Shah Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-10050.1.patch, HIVE-10050.2.patch, HIVE-10050.3.patch The MR AM launched for the TempletonControllerJob does not do any heavy lifting and therefore can be configured to use a small memory footprint ( as compared to potentially using the default footprint for most MR jobs on a cluster ). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10050) Support overriding memory configuration for AM launched for TempletonControllerJob
[ https://issues.apache.org/jira/browse/HIVE-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390018#comment-14390018 ] Lefty Leverenz commented on HIVE-10050: --- Doc note: This adds *templeton.controller.mr.am.java.opts* and *templeton.mr.am.memory.mb* to webhcat-default.xml, so they need to be documented (with version information) in the WebHCat Configuration wikidoc. * [WebHCat Configuration -- Configuration Variables | https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure#WebHCatConfigure-ConfigurationVariables] Support overriding memory configuration for AM launched for TempletonControllerJob -- Key: HIVE-10050 URL: https://issues.apache.org/jira/browse/HIVE-10050 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Hitesh Shah Assignee: Hitesh Shah Labels: TODOC1.2 Fix For: 1.2.0 Attachments: HIVE-10050.1.patch, HIVE-10050.2.patch, HIVE-10050.3.patch The MR AM launched for the TempletonControllerJob does not do any heavy lifting and therefore can be configured to use a small memory footprint ( as compared to potentially using the default footprint for most MR jobs on a cluster ). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390023#comment-14390023 ] Hive QA commented on HIVE-9518: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708534/HIVE-9518.10.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8647 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-parallel_orderby.q-reduce_deduplicate.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-infer_bucket_sort_bucketed_table.q-bucket4.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3230/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3230/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3230/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708534 - PreCommit-HIVE-TRUNK-Build Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Alexander Pivovarov Attachments: HIVE-9518.1.patch, HIVE-9518.10.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch, HIVE-9518.7.patch, HIVE-9518.8.patch, HIVE-9518.9.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The result should be rounded to 8 decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8915) Log file explosion due to non-existence of COMPACTION_QUEUE table
[ https://issues.apache.org/jira/browse/HIVE-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390009#comment-14390009 ] Owen O'Malley commented on HIVE-8915: - Yeah, I hit this too. What is the fix, [~alangates]? Log file explosion due to non-existence of COMPACTION_QUEUE table - Key: HIVE-8915 URL: https://issues.apache.org/jira/browse/HIVE-8915 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 0.14.0, 0.15.0, 0.14.1 Reporter: Sushanth Sowmyan Assignee: Alan Gates I hit an issue with a fresh set up of hive in a vm, where I did not have db tables as specified by hive-txn-schema-0.14.0.mysql.sql created. On metastore startup, I got an endless loop of errors being populated to the log file, which caused the log file to grow to 1.7GB in 5 minutes, with 950k copies of the same error stack trace in it before I realized what was happening and killed it. We should either have a delay of sorts to make sure we don't endlessly respin on that error so quickly, or we should error out and fail if we're not able to start. The stack trace in question is as follows: {noformat} 2014-11-19 01:44:57,654 ERROR compactor.Cleaner (Cleaner.java:run(143)) - Caught an exception in the main loop of compactor cleaner, MetaException(message:Unable to connect to transaction database com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'hive.COMPACTION_QUEUE' doesn't exist at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1990) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2151) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569) at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1524) at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:266) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) ) at org.apache.hadoop.hive.metastore.txn.CompactionTxnHandler.findReadyToClean(CompactionTxnHandler.java:291) at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.run(Cleaner.java:86) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10104) LLAP: Generate consistent splits and locations for the same split across jobs
[ https://issues.apache.org/jira/browse/HIVE-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390028#comment-14390028 ] Lefty Leverenz commented on HIVE-10104: --- This adds *hive.tez.input.generate.consistent.splits* to HiveConf.java, so I'm linking it to HIVE-9850 (documentation for llap). LLAP: Generate consistent splits and locations for the same split across jobs - Key: HIVE-10104 URL: https://issues.apache.org/jira/browse/HIVE-10104 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Siddharth Seth Fix For: llap Attachments: HIVE-10104.1.txt, HIVE-10104.2.txt Locations for splits are currently randomized. Also, the order of splits is random - depending on how threads end up generating the splits. Add an option to sort the splits, and generate repeatable locations - assuming all other factors are the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8818) Create unit test where we insert into an encrypted table and then read from it with hcatalog mapreduce
[ https://issues.apache.org/jira/browse/HIVE-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388776#comment-14388776 ] Hive QA commented on HIVE-8818: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708348/HIVE-8818.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8699 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby3_map {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3219/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3219/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3219/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708348 - PreCommit-HIVE-TRUNK-Build Create unit test where we insert into an encrypted table and then read from it with hcatalog mapreduce -- Key: HIVE-8818 URL: https://issues.apache.org/jira/browse/HIVE-8818 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Dong Chen Attachments: HIVE-8818.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-10134: Attachment: HIVE-10134.1-spark.patch Patch v1 to address the union test failures. Fix test failures after HIVE-10130 [Spark Branch] - Key: HIVE-10134 URL: https://issues.apache.org/jira/browse/HIVE-10134 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Chao Fix For: spark-branch Attachments: HIVE-10134.1-spark.patch Complete test run: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10077) Use new ParquetInputSplit constructor API
[ https://issues.apache.org/jira/browse/HIVE-10077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389213#comment-14389213 ] Hive QA commented on HIVE-10077: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708434/HIVE-10077.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8691 tests executed *Failed tests:* {noformat} TestCustomAuthentication - did not produce a TEST-*.xml file TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3221/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3221/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3221/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708434 - PreCommit-HIVE-TRUNK-Build Use new ParquetInputSplit constructor API - Key: HIVE-10077 URL: https://issues.apache.org/jira/browse/HIVE-10077 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: HIVE-10077.1.patch, HIVE-10077.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9969) Avoid Utilities.getMapRedWork for spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389226#comment-14389226 ] Xuefu Zhang commented on HIVE-9969: --- +1 Avoid Utilities.getMapRedWork for spark [Spark Branch] -- Key: HIVE-9969 URL: https://issues.apache.org/jira/browse/HIVE-9969 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Minor Attachments: HIVE-9969.1-spark.patch The method shouldn't be used for spark mode. Specifically, map work and reduce work have different plan paths in spark. Calling this method will leave lots of errors in executor's log: {noformat} 15/03/16 02:57:23 INFO Utilities: Open file to read in plan: hdfs://node13-1:8020/tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml 15/03/16 02:57:23 INFO Utilities: File not found: File does not exist: /tmp/hive/root/0b3f2ad9-af30-4674-9cfb-1f745a5df51d/hive_2015-03-16_02-57-17_752_4494804875441915487-1/-mr-10003/3897754a-0146-4616-a2f6-b316839a2ad0/reduce.xml at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10021) Alter index rebuild statements submitted through HiveServer2 fail when Sentry is enabled
[ https://issues.apache.org/jira/browse/HIVE-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao updated HIVE-10021: Description: When HiveServer2 is configured to authorize submitted queries and statements through Sentry, any attempt to issue an alter index rebuild statement fails with a SemanticException caused by a NullPointerException. This occurs regardless of whether the index is a compact or bitmap index. The root cause of the problem appears to be the fact that the static createRootTask function in org.apache.hadoop.hive.ql.optimizer.IndexUtils creates a new org.apache.hadoop.hive.ql.Driver object to compile the index builder query, and this new Driver object, unlike the one used by HiveServer2 to compile the submitted statement, is used without having its userName field initialized with the submitting user's username. Adding null checks to the Sentry code is insufficient to solve this problem, because Sentry needs the userName to determine whether or not the submitting user should be able to execute the index rebuild statement. Example stack trace from the HiveServer2 logs: {noformat} FAILED: NullPointerException null java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) at org.apache.hadoop.security.Groups.getGroups(Groups.java:161) at org.apache.sentry.provider.common.HadoopGroupMappingService.getGroups(HadoopGroupMappingService.java:46) at org.apache.sentry.binding.hive.authz.HiveAuthzBinding.getGroups(HiveAuthzBinding.java:370) at org.apache.sentry.binding.hive.HiveAuthzBindingHook.postAnalyze(HiveAuthzBindingHook.java:314) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:440) at org.apache.hadoop.hive.ql.optimizer.IndexUtils.createRootTask(IndexUtils.java:258) at org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler.getIndexBuilderMapRedTask(CompactIndexHandler.java:149) at org.apache.hadoop.hive.ql.index.TableBasedIndexHandler.generateIndexBuildTaskList(TableBasedIndexHandler.java:67) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.getIndexBuilderMapRed(DDLSemanticAnalyzer.java:1171) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterIndexRebuild(DDLSemanticAnalyzer.java:1117) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:410) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:204) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:335) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1026) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1019) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:100) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:173) at org.apache.hive.service.cli.session.HiveSessionImpl.runOperationWithLogCapture(HiveSessionImpl.java:715) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:370) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:357) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:238) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:393) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TServlet.doPost(TServlet.java:83) at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:99) at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:479) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186) at
[jira] [Updated] (HIVE-10148) update of bucking column should not be allowed
[ https://issues.apache.org/jira/browse/HIVE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10148: -- Attachment: HIVE-10148.patch update of bucking column should not be allowed -- Key: HIVE-10148 URL: https://issues.apache.org/jira/browse/HIVE-10148 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.1.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-10148.patch update tbl set a = 5; should raise an error if 'a' is a bucketing column. Such operation is not supported but currently not checked for. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3166) The Hive JDBC driver should accept hive conf and hive variables via connection URL
[ https://issues.apache.org/jira/browse/HIVE-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388914#comment-14388914 ] Mark Grey commented on HIVE-3166: - What's the status on this feature? I could see this being very useful. Is it possible to rebase the patch or is there another implementation now in place for custom hive properties via JDBC driver? The Hive JDBC driver should accept hive conf and hive variables via connection URL -- Key: HIVE-3166 URL: https://issues.apache.org/jira/browse/HIVE-3166 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Labels: api-addition Attachments: HIVE-3166-3.patch The JDBC driver supports running embedded hive. The Hive CLI can accept configuration and hive settings on command line that can be passed down. But the JDBC driver currently doesn't support this. Its also required for SQLLine CLI support since that is a JDBC application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access
[ https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10128: Attachment: HIVE-10128.03.patch BytesBytesMultiHashMap does not allow concurrent read-only access - Key: HIVE-10128 URL: https://issues.apache.org/jira/browse/HIVE-10128 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, HIVE-10128.03.patch, HIVE-10128.patch, hashmap-after.png, hashmap-sync-source.png, hashmap-sync.png The multi-threaded performance takes a serious hit when LLAP shares hashtables between the probe threads running in parallel. !hashmap-sync.png! This is an explicit synchronized block inside ReusableRowContainer which triggers this particular pattern. !hashmap-sync-source.png! Looking deeper into the code, the synchronization seems to be caused due to the fact that WriteBuffers.setReadPoint modifies the otherwise read-only hashtable. To generate this sort of result, run LLAP at a WARN log-level, to avoid all the log synchronization that otherwise affects the thread sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9073) NPE when using custom windowing UDAFs
[ https://issues.apache.org/jira/browse/HIVE-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-9073: - Attachment: HIVE-9073.3.patch rebasing patch with trunk NPE when using custom windowing UDAFs - Key: HIVE-9073 URL: https://issues.apache.org/jira/browse/HIVE-9073 Project: Hive Issue Type: Bug Components: UDF Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-9073.1.patch, HIVE-9073.2.patch, HIVE-9073.2.patch, HIVE-9073.3.patch From the hive-user email group: {noformat} While executing a simple select query using a custom windowing UDAF I created I am constantly running into this error. Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:409) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionInfo(FunctionRegistry.java:647) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.getWindowFunctionInfo(FunctionRegistry.java:1875) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.streamingPossible(WindowingTableFunction.java:150) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.setCanAcceptInputAsStream(WindowingTableFunction.java:221) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.initializeStreaming(WindowingTableFunction.java:266) at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.initializeStreaming(PTFOperator.java:292) at org.apache.hadoop.hive.ql.exec.PTFOperator.initializeOp(PTFOperator.java:86) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166) ... 14 more Just wanted to check if any of you have faced this earlier. Also when I try to run the Custom UDAF on another server it works fine. The only difference I can see it that the hive version I am using on my local machine is 0.13.1 where it is working and on the other machine it is 0.13.0 where I see the above mentioned error. I am not sure if this was a bug which was fixed in the later release but I just wanted to confirm the same. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10161: Summary: LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug) (was: LLAP: IO buffers seem to be hard-coded to 256kb ) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug) Key: HIVE-10161 URL: https://issues.apache.org/jira/browse/HIVE-10161 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap The EncodedReaderImpl will die when reading from the cache, when reading data written by the regular ORC writer {code} Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780) at org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null] {code} Turning off hive.llap.io.enabled makes the error go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10161: Assignee: Prasanth Jayachandran (was: Sergey Shelukhin) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug) Key: HIVE-10161 URL: https://issues.apache.org/jira/browse/HIVE-10161 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Prasanth Jayachandran Fix For: llap The EncodedReaderImpl will die when reading from the cache, when reading data written by the regular ORC writer {code} Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780) at org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null] {code} Turning off hive.llap.io.enabled makes the error go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10161) LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug)
[ https://issues.apache.org/jira/browse/HIVE-10161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389043#comment-14389043 ] Gopal V commented on HIVE-10161: This bug may be due to something else entirely - the error disappears when you restart LLAP and never load varchar columns in. Needs more investigation. LLAP: ORC file contains compression buffers larger than bufferSize (OR reader has a bug) Key: HIVE-10161 URL: https://issues.apache.org/jira/browse/HIVE-10161 Project: Hive Issue Type: Sub-task Affects Versions: llap Reporter: Gopal V Assignee: Prasanth Jayachandran Fix For: llap The EncodedReaderImpl will die when reading from the cache, when reading data written by the regular ORC writer {code} Caused by: java.io.IOException: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.rethrowErrorIfAny(LlapInputFormat.java:249) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.nextCvb(LlapInputFormat.java:201) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140) at org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:96) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 22 more Caused by: java.lang.IllegalArgumentException: Buffer size too small. size = 262144 needed = 3919246 at org.apache.hadoop.hive.ql.io.orc.InStream.addOneCompressionBuffer(InStream.java:780) at org.apache.hadoop.hive.ql.io.orc.InStream.uncompressStream(InStream.java:628) at org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:309) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:278) at org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:48) at org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37) ... 4 more ]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1424502260528_1945_1_00 [Map 1] killed/failed due to:null] {code} Turning off hive.llap.io.enabled makes the error go away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access
[ https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388920#comment-14388920 ] Sergey Shelukhin commented on HIVE-10128: - Sorry, use fell thru the cracks when I was making trunk patch out of llap patch. debugDumpMetrics is threadsafe. Renamed. Non-thread-safe methods may be more logical for some callers.. BytesBytesMultiHashMap does not allow concurrent read-only access - Key: HIVE-10128 URL: https://issues.apache.org/jira/browse/HIVE-10128 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, HIVE-10128.patch, hashmap-after.png, hashmap-sync-source.png, hashmap-sync.png The multi-threaded performance takes a serious hit when LLAP shares hashtables between the probe threads running in parallel. !hashmap-sync.png! This is an explicit synchronized block inside ReusableRowContainer which triggers this particular pattern. !hashmap-sync-source.png! Looking deeper into the code, the synchronization seems to be caused due to the fact that WriteBuffers.setReadPoint modifies the otherwise read-only hashtable. To generate this sort of result, run LLAP at a WARN log-level, to avoid all the log synchronization that otherwise affects the thread sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9727) GroupingID translation from Calcite
[ https://issues.apache.org/jira/browse/HIVE-9727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9727: --- Affects Version/s: 1.1.0 0.14.0 1.0.0 GroupingID translation from Calcite --- Key: HIVE-9727 URL: https://issues.apache.org/jira/browse/HIVE-9727 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 0.14.0, 1.0.0, 1.1.0 Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 1.2.0 Attachments: HIVE-9727.01.patch, HIVE-9727.02.patch, HIVE-9727.03.patch, HIVE-9727.04.patch, HIVE-9727.patch The translation from Calcite back to Hive might produce wrong results while interacting with other Calcite optimization rules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10083) SMBJoin fails in case one table is uninitialized
[ https://issues.apache.org/jira/browse/HIVE-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388940#comment-14388940 ] Chao commented on HIVE-10083: - +1. I think the test failure is unrelated. SMBJoin fails in case one table is uninitialized Key: HIVE-10083 URL: https://issues.apache.org/jira/browse/HIVE-10083 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.0 Environment: MapR Hive 0.13 Reporter: Alain Schröder Assignee: Na Yang Priority: Minor Attachments: HIVE-10083.patch We experience IndexOutOfBoundsException in a SMBJoin in the case on the tables used for the JOIN is uninitialized. Everything works if both are uninitialized or initialized. {code} 2015-03-24 09:12:58,967 ERROR [main]: ql.Driver (SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486) at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549) at org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51) [...] {code} Simplest way to reproduce: {code} SET hive.enforce.sorting=true; SET hive.enforce.bucketing=true; SET hive.exec.dynamic.partition=true; SET mapreduce.reduce.import.limit=-1; SET hive.optimize.bucketmapjoin=true; SET hive.optimize.bucketmapjoin.sortedmerge=true; SET hive.auto.convert.join=true; SET hive.auto.convert.sortmerge.join=true; SET hive.auto.convert.sortmerge.join.noconditionaltask=true; CREATE DATABASE IF NOT EXISTS tmp; USE tmp; CREATE TABLE `test1` ( `foo` bigint ) CLUSTERED BY ( foo) SORTED BY ( foo ASC) INTO 384 BUCKETS stored as orc; CREATE TABLE `test2`( `foo` bigint ) CLUSTERED BY ( foo) SORTED BY ( foo ASC) INTO 384 BUCKETS STORED AS ORC; -- Initialize ONE table of the two tables with any data. INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100; SELECT t1.foo, t2.foo FROM test1 t1 INNER JOIN test2 t2 ON (t1.foo = t2.foo); {code} I took a look at the Procedure fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in AbstractBucketJoinProc.java and it does not seem to have changed from our MapR Hive 0.13 to current snapshot, so this should be also an error in the current Version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10083) SMBJoin fails in case one table is uninitialized
[ https://issues.apache.org/jira/browse/HIVE-10083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388973#comment-14388973 ] Na Yang commented on HIVE-10083: Thank you [~csun] for the code review. I ran the q test for smb_mapjoin_8.q on my local machine and it was successful. SMBJoin fails in case one table is uninitialized Key: HIVE-10083 URL: https://issues.apache.org/jira/browse/HIVE-10083 Project: Hive Issue Type: Bug Components: Logical Optimizer Affects Versions: 0.13.0 Environment: MapR Hive 0.13 Reporter: Alain Schröder Assignee: Na Yang Priority: Minor Attachments: HIVE-10083.patch We experience IndexOutOfBoundsException in a SMBJoin in the case on the tables used for the JOIN is uninitialized. Everything works if both are uninitialized or initialized. {code} 2015-03-24 09:12:58,967 ERROR [main]: ql.Driver (SessionState.java:printError(545)) - FAILED: IndexOutOfBoundsException Index: 0, Size: 0 java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.fillMappingBigTableBucketFileNameToSmallTableBucketFileNames(AbstractBucketJoinProc.java:486) at org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.convertMapJoinToBucketMapJoin(AbstractBucketJoinProc.java:429) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToBucketMapJoin(AbstractSMBJoinProc.java:540) at org.apache.hadoop.hive.ql.optimizer.AbstractSMBJoinProc.convertJoinToSMBJoin(AbstractSMBJoinProc.java:549) at org.apache.hadoop.hive.ql.optimizer.SortedMergeJoinProc.process(SortedMergeJoinProc.java:51) [...] {code} Simplest way to reproduce: {code} SET hive.enforce.sorting=true; SET hive.enforce.bucketing=true; SET hive.exec.dynamic.partition=true; SET mapreduce.reduce.import.limit=-1; SET hive.optimize.bucketmapjoin=true; SET hive.optimize.bucketmapjoin.sortedmerge=true; SET hive.auto.convert.join=true; SET hive.auto.convert.sortmerge.join=true; SET hive.auto.convert.sortmerge.join.noconditionaltask=true; CREATE DATABASE IF NOT EXISTS tmp; USE tmp; CREATE TABLE `test1` ( `foo` bigint ) CLUSTERED BY ( foo) SORTED BY ( foo ASC) INTO 384 BUCKETS stored as orc; CREATE TABLE `test2`( `foo` bigint ) CLUSTERED BY ( foo) SORTED BY ( foo ASC) INTO 384 BUCKETS STORED AS ORC; -- Initialize ONE table of the two tables with any data. INSERT INTO TABLE test1 SELECT foo FROM table_with_some_content LIMIT 100; SELECT t1.foo, t2.foo FROM test1 t1 INNER JOIN test2 t2 ON (t1.foo = t2.foo); {code} I took a look at the Procedure fillMappingBigTableBucketFileNameToSmallTableBucketFileNames in AbstractBucketJoinProc.java and it does not seem to have changed from our MapR Hive 0.13 to current snapshot, so this should be also an error in the current Version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10167) HS2 logs the server started only before the server is shut down
[ https://issues.apache.org/jira/browse/HIVE-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-10167: --- Attachment: HIVE-10167.1.patch HS2 logs the server started only before the server is shut down --- Key: HIVE-10167 URL: https://issues.apache.org/jira/browse/HIVE-10167 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Attachments: HIVE-10167.1.patch TThreadPoolServer#serve() blocks till the server is down. We should log before that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10159) HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc
[ https://issues.apache.org/jira/browse/HIVE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-10159: --- Attachment: HIVE-10159.1.patch TestCliDriver mapjoin_decimal.q and TestMinimrCliDriver smb_mapjoin_8.q work fine for me locally Let me reattach the patch #1 to run the build again HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc - Key: HIVE-10159 URL: https://issues.apache.org/jira/browse/HIVE-10159 Project: Hive Issue Type: Improvement Components: Query Planning Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-10159.1.patch, HIVE-10159.1.patch MapJoinDesc and HashTableSinkDesc are derived from JoinDesc HashTableSinkDesc and MapJoinDesc have keyTblDesc field. JoinDesc has keyTableDesc field. I think HashTableSinkDesc and MapJoinDesc can use superclass (JoinDesc) keyTableDesc field instead of defining their own keyTblDesc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10151) insert into A select from B is broken when both A and B are Acid tables and bucketed the same way
[ https://issues.apache.org/jira/browse/HIVE-10151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10151: -- Summary: insert into A select from B is broken when both A and B are Acid tables and bucketed the same way (was: Acid table insert as select) insert into A select from B is broken when both A and B are Acid tables and bucketed the same way - Key: HIVE-10151 URL: https://issues.apache.org/jira/browse/HIVE-10151 Project: Hive Issue Type: Bug Components: Query Planning, Transactions Affects Versions: 1.1.0 Reporter: Eugene Koifman Assignee: Eugene Koifman BucketingSortingReduceSinkOptimizer makes insert into AcidTable select * from otherAcidTable use BucketizedHiveInputFormat which bypasses ORC merge logic on read and tries to send bucket files (rather than table dir) down to OrcInputFormat. (this is true only if both AcidTable and otherAcidTable are bucketed the same way). Then ORC dies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access
[ https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388906#comment-14388906 ] Sergey Shelukhin commented on HIVE-10128: - It's used in the updated patch. I'll look at the failures and feedback and update the patch BytesBytesMultiHashMap does not allow concurrent read-only access - Key: HIVE-10128 URL: https://issues.apache.org/jira/browse/HIVE-10128 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, HIVE-10128.patch, hashmap-after.png, hashmap-sync-source.png, hashmap-sync.png The multi-threaded performance takes a serious hit when LLAP shares hashtables between the probe threads running in parallel. !hashmap-sync.png! This is an explicit synchronized block inside ReusableRowContainer which triggers this particular pattern. !hashmap-sync-source.png! Looking deeper into the code, the synchronization seems to be caused due to the fact that WriteBuffers.setReadPoint modifies the otherwise read-only hashtable. To generate this sort of result, run LLAP at a WARN log-level, to avoid all the log synchronization that otherwise affects the thread sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10092) LLAP: improve how buffers are locked for split
[ https://issues.apache.org/jira/browse/HIVE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389103#comment-14389103 ] Sergey Shelukhin commented on HIVE-10092: - This is not a simple problem... LLAP: improve how buffers are locked for split -- Key: HIVE-10092 URL: https://issues.apache.org/jira/browse/HIVE-10092 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Right now, for simplicity, entire split of decompressed buffers is locked in cache, in case some buffers are shared between RGs, to avoid dealing with situations where we uncompress some data, pass it on to processor for RG N, then processor processes and unlocks it, and before we can pass it on for RG N+1 it's evicted. However, if split is too big, and cache is small, or many splits are processed at the same time, this can result in a deadlock as entire cache is locked. We need to improve locking to be more granular and probably also try to avoid deadlocks in general (bypass cache?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9845) HCatSplit repeats information making input split data size huge
[ https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388925#comment-14388925 ] Mithun Radhakrishnan commented on HIVE-9845: Bah, finally. Unrelated test-failures. HCatSplit repeats information making input split data size huge --- Key: HIVE-9845 URL: https://issues.apache.org/jira/browse/HIVE-9845 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Rohini Palaniswamy Assignee: Mithun Radhakrishnan Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which has even triple the number of splits(100K+ splits and tasks) does not hit that issue. {code} HCatBaseInputFormat.java: //Call getSplit on the InputFormat, create an //HCatSplit for each underlying split //NumSplits is 0 for our purposes org.apache.hadoop.mapred.InputSplit[] baseSplits = inputFormat.getSplits(jobConf, 0); for(org.apache.hadoop.mapred.InputSplit split : baseSplits) { splits.add(new HCatSplit( partitionInfo, split,allCols)); } {code} Each hcatSplit duplicates partition schema and table schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10108) Index#getIndexTableName() returns db.index_table_name
[ https://issues.apache.org/jira/browse/HIVE-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-10108: --- Attachment: HIVE-10108.1.patch These test failures are not related. Renamed the patch for trunk. Index#getIndexTableName() returns db.index_table_name - Key: HIVE-10108 URL: https://issues.apache.org/jira/browse/HIVE-10108 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: HIVE-10108.1-spark.patch, HIVE-10108.1.patch Index#getIndexTableName() used to just returns index table name. Now it returns a qualified table name. This change was introduced in HIVE-3781. As a result: IMetaStoreClient#getTable(index.getDbName(), index.getIndexTableName()) throws ObjectNotFoundException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access
[ https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389026#comment-14389026 ] Sergey Shelukhin commented on HIVE-10128: - Tests failed with a weird issue {noformat} java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.WriteBuffers.getReadPosition()Lorg/apache/hadoop/hive/serde2/WriteBuffers$Position; at org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.getValueRefs(BytesBytesMultiHashMap.java:268) at org.apache.hadoop.hive.ql.exec.persistence.TestBytesBytesMultiHashMap.testGetNonExistent(TestBytesBytesMultiHashMap.java:84) {noformat} which seems to be a build problem. They pass locally. BytesBytesMultiHashMap does not allow concurrent read-only access - Key: HIVE-10128 URL: https://issues.apache.org/jira/browse/HIVE-10128 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, HIVE-10128.03.patch, HIVE-10128.patch, hashmap-after.png, hashmap-sync-source.png, hashmap-sync.png The multi-threaded performance takes a serious hit when LLAP shares hashtables between the probe threads running in parallel. !hashmap-sync.png! This is an explicit synchronized block inside ReusableRowContainer which triggers this particular pattern. !hashmap-sync-source.png! Looking deeper into the code, the synchronization seems to be caused due to the fact that WriteBuffers.setReadPoint modifies the otherwise read-only hashtable. To generate this sort of result, run LLAP at a WARN log-level, to avoid all the log synchronization that otherwise affects the thread sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9845) HCatSplit repeats information making input split data size huge
[ https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-9845: --- Attachment: (was: HIVE-9845.3.patch) HCatSplit repeats information making input split data size huge --- Key: HIVE-9845 URL: https://issues.apache.org/jira/browse/HIVE-9845 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Rohini Palaniswamy Assignee: Mithun Radhakrishnan Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which has even triple the number of splits(100K+ splits and tasks) does not hit that issue. {code} HCatBaseInputFormat.java: //Call getSplit on the InputFormat, create an //HCatSplit for each underlying split //NumSplits is 0 for our purposes org.apache.hadoop.mapred.InputSplit[] baseSplits = inputFormat.getSplits(jobConf, 0); for(org.apache.hadoop.mapred.InputSplit split : baseSplits) { splits.add(new HCatSplit( partitionInfo, split,allCols)); } {code} Each hcatSplit duplicates partition schema and table schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9845) HCatSplit repeats information making input split data size huge
[ https://issues.apache.org/jira/browse/HIVE-9845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-9845: --- Attachment: HIVE-9845.3.patch HCatSplit repeats information making input split data size huge --- Key: HIVE-9845 URL: https://issues.apache.org/jira/browse/HIVE-9845 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Rohini Palaniswamy Assignee: Mithun Radhakrishnan Attachments: HIVE-9845.1.patch, HIVE-9845.3.patch Pig on Tez jobs with larger tables hit PIG-4443. Running on HDFS data which has even triple the number of splits(100K+ splits and tasks) does not hit that issue. {code} HCatBaseInputFormat.java: //Call getSplit on the InputFormat, create an //HCatSplit for each underlying split //NumSplits is 0 for our purposes org.apache.hadoop.mapred.InputSplit[] baseSplits = inputFormat.getSplits(jobConf, 0); for(org.apache.hadoop.mapred.InputSplit split : baseSplits) { splits.add(new HCatSplit( partitionInfo, split,allCols)); } {code} Each hcatSplit duplicates partition schema and table schema. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10148) update of bucking column should not be allowed
[ https://issues.apache.org/jira/browse/HIVE-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-10148: -- Description: update tbl set a = 5; should raise an error if 'a' is a bucketing column. Such operation is not supported but currently not checked for. was: update tbl set a = 5; should raise an error if 'a' is a bucketing column. Such operation is not supported but currently not checked. update of bucking column should not be allowed -- Key: HIVE-10148 URL: https://issues.apache.org/jira/browse/HIVE-10148 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 1.1.0 Reporter: Eugene Koifman Assignee: Eugene Koifman update tbl set a = 5; should raise an error if 'a' is a bucketing column. Such operation is not supported but currently not checked for. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9518: -- Attachment: HIVE-9518.9.patch The function should support both short date and full timestamp string format and it should not skip time part. String lenght can not be used to determine the format because year might be less that 4 chars and day and month can be just 1 char This is why I decided to use both Timestamp and Date converters to convert input value to java Date. I also removed the fix I did before to GenericUDF which consider string lenght (str.length==10) I added tests for dates without day, dates with partial time (no seconds) and dates with short year, month and day. Now string Dates parsing behavious shold be consistend with other UDFs (e.g. datediff) Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Alexander Pivovarov Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch, HIVE-9518.7.patch, HIVE-9518.8.patch, HIVE-9518.9.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The result should be rounded to 8 decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10167) HS2 logs the server started only before the server is shut down
[ https://issues.apache.org/jira/browse/HIVE-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389339#comment-14389339 ] Hive QA commented on HIVE-10167: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708478/HIVE-10167.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8677 tests executed *Failed tests:* {noformat} TestCliDriver-join36.q-udf_bitwise_or.q-add_part_exist.q-and-12-more - did not produce a TEST-*.xml file TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3222/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3222/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3222/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708478 - PreCommit-HIVE-TRUNK-Build HS2 logs the server started only before the server is shut down --- Key: HIVE-10167 URL: https://issues.apache.org/jira/browse/HIVE-10167 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Fix For: 1.2.0 Attachments: HIVE-10167.1.patch TThreadPoolServer#serve() blocks till the server is down. We should log before that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10134) Fix test failures after HIVE-10130 [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389352#comment-14389352 ] Hive QA commented on HIVE-10134: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708494/HIVE-10134.1-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 8710 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6_subq {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/817/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/817/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-817/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708494 - PreCommit-HIVE-SPARK-Build Fix test failures after HIVE-10130 [Spark Branch] - Key: HIVE-10134 URL: https://issues.apache.org/jira/browse/HIVE-10134 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Xuefu Zhang Assignee: Chao Fix For: spark-branch Attachments: HIVE-10134.1-spark.patch Complete test run: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/812/#showFailuresLink *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union31 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_6_subq org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10114) Split strategies for ORC
[ https://issues.apache.org/jira/browse/HIVE-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-10114: - Attachment: HIVE-10114.3.patch Some clean up. Test case fixes. And hive conf description changes based on Gopal's comment. Split strategies for ORC Key: HIVE-10114 URL: https://issues.apache.org/jira/browse/HIVE-10114 Project: Hive Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Attachments: HIVE-10114.1.patch, HIVE-10114.2.patch, HIVE-10114.3.patch ORC split generation does not have clearly defined strategies for different scenarios (many small orc files, few small orc files, many large files etc.). Few strategies like storing the file footer in orc split, making entire file as a orc split already exists. This JIRA to make the split generation simpler, support different strategies for various use cases (BI, ETL, ACID etc.) and to lay the foundation for HIVE-7428. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.
[ https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliot West updated HIVE-10165: --- Description: h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes outside of Hive that merge changed facts into existing datasets. Traditionally we achieve this by: reading in a ground-truth dataset and a modified dataset, grouping by a key, sorting by a sequence and then applying a function to determine inserted, updated, and deleted rows. However, in our current scheme we must rewrite all partitions that may potentially contain changes. In practice the number of mutated records is very small when compared with the records contained in a partition. This approach results in a number of operational issues: * Excessive amount of write activity required for small data changes. * Downstream applications cannot robustly read these datasets while they are being updated. * Due to scale of the updates (hundreds or partitions) the scope for contention is high. I believe we can address this problem by instead writing only the changed records to a Hive transactional table. This should drastically reduce the amount of data that we need to write and also provide a means for managing concurrent access to the data. Our existing merge processes can read and retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to an updated form of the hive-hcatalog-streaming API which will then have the required data to perform an update or insert in a transactional manner. h3. Benefits * Enables the creation of large-scale dataset merge processes * Opens up Hive transactional functionality in an accessible manner to processes that operate outside of Hive. h3. Implementation We've patched the API to provide visibility to the underlying {{OrcRecordUpdater}} and allow extension of the {{AbstractRecordWriter}} by third-parties outside of the package. We've also updated the user facing interfaces to provide update and delete functionality. I've provided the modifications as three incremental patches. Generally speaking, each patch makes the API less backwards compatible but more consistent with respect to offering updates, deletes as well as writes (inserts). Ideally I hope that all three patches have merit, but only the first patch is absolutely necessary to enable the features we need on the API, and it does so in a backwards compatible way. I'll summarise the contents of each patch: h4. [^HIVE-10165.0.patch] - Required This patch contains what we consider to be the minimum amount of changes required to allow users to create {{RecordWriter}} subclasses that can insert, update, and delete records. These changes also maintain backwards compatibility at the expense of confusing the API a little. Note that the row representation has be changed from {{byte[]}} to {{Object}}. Within our data processing jobs our records are often available in a strongly typed and decoded form such as a POJO or a Tuple object. Therefore is seems to make sense that we are able to pass this through to the {{OrcRecordUpdater}} without having to go through a {{byte[]}} encoding step. This of course still allows users to use {{byte[]}} if they wish. h4. [^HIVE-10165.1.patch] - Nice to have This patch builds on the changes made in the *required* patch and aims to make the API cleaner and more consistent while accommodating updates and inserts. It also adds some logic to prevent the user from submitting multiple operation types to a single {{TransactionBatch}} as we found this creates data inconsistencies within the Hive table. This patch breaks backwards compatibility. h4. [^HIVE-10165.2.patch] - Nomenclature This final patch simply renames some of existing types to more accurately convey their increased responsibilities. The API is no longer writing just new records, it is now also responsible for writing operations that are applied to existing records. This patch breaks backwards compatibility. h3. Example I've attached simple typical usage of the API. This is not a patch and is intended as an illustration only: [^ReflectiveOperationWriter.java] h3. Known issues I have not yet provided any unit tests for the extended functionality. I fully expect that these are required and will work on these if these patches have merit. *Note: Attachments to follow.* was: h3. Overview I'd like to extend the [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest] API so that it also supports the writing of record updates and deletes in addition to the already supported inserts. h3. Motivation We have many Hadoop processes
[jira] [Commented] (HIVE-10116) CBO (Calcite Return Path): RelMdSize throws an Exception when Join is actually a Semijoin [CBO branch]
[ https://issues.apache.org/jira/browse/HIVE-10116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389422#comment-14389422 ] Mostafa Mokhtar commented on HIVE-10116: [~jcamachorodriguez] Is this the same issue {code} explain select ca_zip, ca_county, sum(ws_sales_price) from web_sales JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk JOIN customer_address ON customer.c_current_addr_sk = customer_address.ca_address_sk JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk JOIN item ON web_sales.ws_item_sk = item.i_item_sk where ( item.i_item_id in (select i_item_id from item i2 where i2.i_item_sk in (2, 3, 5, 7, 11, 13, 17, 19, 23, 29) ) ) and d_qoy = 2 and d_year = 2000 group by ca_zip, ca_county order by ca_zip, ca_county limit 100 15/03/27 12:16:48 [main]: ERROR parse.CalcitePlanner: CBO failed, skipping CBO. java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.calcite.rel.metadata.RelMdSize.averageColumnSizes(RelMdSize.java:193) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$1$1.invoke(ReflectiveRelMetadataProvider.java:182) at com.sun.proxy.$Proxy51.averageColumnSizes(Unknown Source) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109) at com.sun.proxy.$Proxy51.averageColumnSizes(Unknown Source) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109) at com.sun.proxy.$Proxy51.averageColumnSizes(Unknown Source) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.CachingRelMetadataProvider$CachingInvocationHandler.invoke(CachingRelMetadataProvider.java:131) at com.sun.proxy.$Proxy51.averageColumnSizes(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getAverageColumnSizes(RelMetadataQuery.java:360) at org.apache.calcite.rel.metadata.RelMdSize.averageRowSize(RelMdSize.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider$1$1.invoke(ReflectiveRelMetadataProvider.java:182) at com.sun.proxy.$Proxy51.averageRowSize(Unknown Source) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109) at com.sun.proxy.$Proxy51.averageRowSize(Unknown Source) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.calcite.rel.metadata.ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(ChainedRelMetadataProvider.java:109) at com.sun.proxy.$Proxy51.averageRowSize(Unknown Source) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
[jira] [Updated] (HIVE-10108) Index#getIndexTableName() returns db.index_table_name
[ https://issues.apache.org/jira/browse/HIVE-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-10108: --- Attachment: HIVE-10108.2.patch Attached patch v2 that's rebased to trunk latest. Index#getIndexTableName() returns db.index_table_name - Key: HIVE-10108 URL: https://issues.apache.org/jira/browse/HIVE-10108 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: HIVE-10108.1-spark.patch, HIVE-10108.1.patch, HIVE-10108.2.patch Index#getIndexTableName() used to just returns index table name. Now it returns a qualified table name. This change was introduced in HIVE-3781. As a result: IMetaStoreClient#getTable(index.getDbName(), index.getIndexTableName()) throws ObjectNotFoundException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10167) HS2 logs the server started only before the server is shut down
[ https://issues.apache.org/jira/browse/HIVE-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389663#comment-14389663 ] Chao commented on HIVE-10167: - +1 HS2 logs the server started only before the server is shut down --- Key: HIVE-10167 URL: https://issues.apache.org/jira/browse/HIVE-10167 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Fix For: 1.2.0 Attachments: HIVE-10167.1.patch TThreadPoolServer#serve() blocks till the server is down. We should log before that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10159) HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc
[ https://issues.apache.org/jira/browse/HIVE-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389478#comment-14389478 ] Hive QA commented on HIVE-10159: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708483/HIVE-10159.1.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8692 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_mapjoin_decimal {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3224/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3224/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3224/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708483 - PreCommit-HIVE-TRUNK-Build HashTableSinkDesc and MapJoinDesc keyTblDesc can be replaced by JoinDesc.keyTableDesc - Key: HIVE-10159 URL: https://issues.apache.org/jira/browse/HIVE-10159 Project: Hive Issue Type: Improvement Components: Query Planning Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-10159.1.patch, HIVE-10159.1.patch MapJoinDesc and HashTableSinkDesc are derived from JoinDesc HashTableSinkDesc and MapJoinDesc have keyTblDesc field. JoinDesc has keyTableDesc field. I think HashTableSinkDesc and MapJoinDesc can use superclass (JoinDesc) keyTableDesc field instead of defining their own keyTblDesc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9518: -- Attachment: HIVE-9518.10.patch patch #10 - added SEC_IN_31_DAYS constant for clarity Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Alexander Pivovarov Attachments: HIVE-9518.1.patch, HIVE-9518.10.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch, HIVE-9518.7.patch, HIVE-9518.8.patch, HIVE-9518.9.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The result should be rounded to 8 decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10103) LLAP: Cancelling tasks fails to stop cache filling threads
[ https://issues.apache.org/jira/browse/HIVE-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-10103: --- Assignee: Sergey Shelukhin LLAP: Cancelling tasks fails to stop cache filling threads -- Key: HIVE-10103 URL: https://issues.apache.org/jira/browse/HIVE-10103 Project: Hive Issue Type: Sub-task Reporter: Gopal V Assignee: Sergey Shelukhin Running a bad query (~1Tb scan on a 1Gb cache) and killing the tasks via the container launcher fails to free up the cache filler threads. The cache filler threads with no consumers get stuck into a loop {code} 2015-03-26 14:02:47,335 [pool-2-thread-2(container_1_1659_01_74_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map 1_73_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot evict blocks for 262144 calls; cache full? 2015-03-26 14:02:48,018 [pool-2-thread-7(container_1_1659_01_76_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map 1_75_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot evict blocks for 262144 calls; cache full? 2015-03-26 14:02:51,658 [pool-2-thread-1(container_1_1659_01_73_gopal_20150326135614_2bb61f02-3c2b-4512-a34e-81803cd13fb6:1_Map 1_72_0)] WARN org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl: Cannot evict blocks for 262144 calls; cache full? {code} Needs to kill a daemon to get back to normal operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10108) Index#getIndexTableName() returns db.index_table_name
[ https://issues.apache.org/jira/browse/HIVE-10108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389343#comment-14389343 ] Hive QA commented on HIVE-10108: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708479/HIVE-10108.1.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3223/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3223/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3223/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.7.0_45-cloudera/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-3223/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/scheduler/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/thirdparty itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target itests/hive-jmh/target itests/hive-unit/target itests/custom-serde/target itests/util/target itests/qtest-spark/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target accumulo-handler/target hwi/target common/target common/src/gen spark-client/target contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1670470. At revision 1670470. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12708479 - PreCommit-HIVE-TRUNK-Build Index#getIndexTableName() returns db.index_table_name - Key: HIVE-10108 URL: https://issues.apache.org/jira/browse/HIVE-10108 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: HIVE-10108.1-spark.patch, HIVE-10108.1.patch Index#getIndexTableName() used to just returns index table name. Now it returns a qualified table name. This change was introduced in HIVE-3781. As a result: IMetaStoreClient#getTable(index.getDbName(), index.getIndexTableName()) throws ObjectNotFoundException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9518) Implement MONTHS_BETWEEN aligned with Oracle one
[ https://issues.apache.org/jira/browse/HIVE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389381#comment-14389381 ] Mohit Sabharwal commented on HIVE-9518: --- lgtm, +1 (non-binding) Implement MONTHS_BETWEEN aligned with Oracle one Key: HIVE-9518 URL: https://issues.apache.org/jira/browse/HIVE-9518 Project: Hive Issue Type: Improvement Components: UDF Reporter: Xiaobing Zhou Assignee: Alexander Pivovarov Attachments: HIVE-9518.1.patch, HIVE-9518.2.patch, HIVE-9518.3.patch, HIVE-9518.4.patch, HIVE-9518.5.patch, HIVE-9518.6.patch, HIVE-9518.7.patch, HIVE-9518.8.patch, HIVE-9518.9.patch This is used to track work to build Oracle like months_between. Here's semantics: MONTHS_BETWEEN returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. If date1 and date2 are either the same days of the month or both last days of months, then the result is always an integer. Otherwise Oracle Database calculates the fractional portion of the result based on a 31-day month and considers the difference in time components date1 and date2. Should accept date, timestamp and string arguments in the format '-MM-dd' or '-MM-dd HH:mm:ss'. The result should be rounded to 8 decimal places. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10092) LLAP: improve how buffers are locked for split
[ https://issues.apache.org/jira/browse/HIVE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-10092: Attachment: HIVE-10092.patch Committed to branch... attaching patch here for reference since bugs are possible LLAP: improve how buffers are locked for split -- Key: HIVE-10092 URL: https://issues.apache.org/jira/browse/HIVE-10092 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-10092.patch Right now, for simplicity, entire split of decompressed buffers is locked in cache, in case some buffers are shared between RGs, to avoid dealing with situations where we uncompress some data, pass it on to processor for RG N, then processor processes and unlocks it, and before we can pass it on for RG N+1 it's evicted. However, if split is too big, and cache is small, or many splits are processed at the same time, this can result in a deadlock as entire cache is locked. We need to improve locking to be more granular and probably also try to avoid deadlocks in general (bypass cache?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10092) LLAP: improve how buffers are locked for split
[ https://issues.apache.org/jira/browse/HIVE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-10092. - Resolution: Fixed Fix Version/s: llap [~gopalv] fyi LLAP: improve how buffers are locked for split -- Key: HIVE-10092 URL: https://issues.apache.org/jira/browse/HIVE-10092 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10092.patch Right now, for simplicity, entire split of decompressed buffers is locked in cache, in case some buffers are shared between RGs, to avoid dealing with situations where we uncompress some data, pass it on to processor for RG N, then processor processes and unlocks it, and before we can pass it on for RG N+1 it's evicted. However, if split is too big, and cache is small, or many splits are processed at the same time, this can result in a deadlock as entire cache is locked. We need to improve locking to be more granular and probably also try to avoid deadlocks in general (bypass cache?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10092) LLAP: improve how buffers are locked for split
[ https://issues.apache.org/jira/browse/HIVE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389534#comment-14389534 ] Sergey Shelukhin commented on HIVE-10092: - Separate jira for cache bypassing is HIVE-10170 LLAP: improve how buffers are locked for split -- Key: HIVE-10092 URL: https://issues.apache.org/jira/browse/HIVE-10092 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10092.patch Right now, for simplicity, entire split of decompressed buffers is locked in cache, in case some buffers are shared between RGs, to avoid dealing with situations where we uncompress some data, pass it on to processor for RG N, then processor processes and unlocks it, and before we can pass it on for RG N+1 it's evicted. However, if split is too big, and cache is small, or many splits are processed at the same time, this can result in a deadlock as entire cache is locked. We need to improve locking to be more granular and probably also try to avoid deadlocks in general (bypass cache?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10128) BytesBytesMultiHashMap does not allow concurrent read-only access
[ https://issues.apache.org/jira/browse/HIVE-10128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389617#comment-14389617 ] Hive QA commented on HIVE-10128: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12708481/HIVE-10128.03.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8691 tests executed *Failed tests:* {noformat} TestDummy - did not produce a TEST-*.xml file TestMinimrCliDriver-smb_mapjoin_8.q - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3225/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3225/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3225/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12708481 - PreCommit-HIVE-TRUNK-Build BytesBytesMultiHashMap does not allow concurrent read-only access - Key: HIVE-10128 URL: https://issues.apache.org/jira/browse/HIVE-10128 Project: Hive Issue Type: Bug Reporter: Gopal V Assignee: Sergey Shelukhin Fix For: llap Attachments: HIVE-10128.01.patch, HIVE-10128.02.patch, HIVE-10128.03.patch, HIVE-10128.patch, hashmap-after.png, hashmap-sync-source.png, hashmap-sync.png The multi-threaded performance takes a serious hit when LLAP shares hashtables between the probe threads running in parallel. !hashmap-sync.png! This is an explicit synchronized block inside ReusableRowContainer which triggers this particular pattern. !hashmap-sync-source.png! Looking deeper into the code, the synchronization seems to be caused due to the fact that WriteBuffers.setReadPoint modifies the otherwise read-only hashtable. To generate this sort of result, run LLAP at a WARN log-level, to avoid all the log synchronization that otherwise affects the thread sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-10111) LLAP: query 7 produces corrupted result with IO enabled
[ https://issues.apache.org/jira/browse/HIVE-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-10111: --- Assignee: Sergey Shelukhin LLAP: query 7 produces corrupted result with IO enabled --- Key: HIVE-10111 URL: https://issues.apache.org/jira/browse/HIVE-10111 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Bogus rows appear in the beginning of the result: {noformat} NULL 97.098.01999664306640.0 84.2991552734 AAAֺK�G6GDHE�ڗ��G7GDHEAA 9.0 40.619998931884766 0.0 34.119998931884766 AAAEK��d NULLNULLNULLNULL KAEGn@j��d KAEGAA 97.06.09904632568 313.4899902343755.67076293945 AAA|EBCA��8��EBCAAA 97.093.66999816894531 0.0 2.80942779541 AAA�g�IOLIy{�KOLIAA 72.0 51.220001220703125 0.0 1.529713897705 AAA�+�D�%�GKFC��k��%�IKFCAA 17.0 121.915258789 0.0 110.91999816894531 AAA�P��F�KIGE�B��F�KIGEAA15.0 81.2008447266 124.2300033569336 24.36610351562 AAAޙ ��i�OIJG�a(��i�OIJGAA33.0 78.34999847412110.0 68.9400024 {noformat}. Then the correct results follow, shifted by corresponding number of rows -- This message was sent by Atlassian JIRA (v6.3.4#6332)