[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
[ https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081694#comment-16081694 ] liyunzhang_intel commented on HIVE-17018: - [~csun]: {quote} Are you trying to explain that HoS is overly aggressive in turning JOINs to MAPJOINs when there're chained JOIN operators? {quote} I can not explain. From the [code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364], I guess this is what author wanted. But from the definition of {{hive.auto.convert.join.noconditionaltask.size}}, I think this is confusing. {noformat} hive.auto.convert.join.noconditionaltask.size means the sum of size for n-1 of the tables/partitions for a n-way join is smaller than it, it will be converted to a map join. {noformat} The code was committed by {noformat} HIVE-8943: Fix memory limit check for combine nested mapjoins [Spark Branch] (Szehon via Xuefu) git-svn-id: https://svn.apache.org/repos/asf/hive/branches/spark@1643058 13f79535-47bb-0310-9956-ffa450edef68 {noformat} > Small table is converted to map join even the total size of small tables > exceeds the threshold(hive.auto.convert.join.noconditionaltask.size) > - > > Key: HIVE-17018 > URL: https://issues.apache.org/jira/browse/HIVE-17018 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt > > > we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it > means the sum of size for n-1 of the tables/partitions for a n-way join is > smaller than it, it will be converted to a map join. for example, A join B > join C join D join E. Big table is A(100M), small tables are > B(10M),C(10M),D(10M),E(10M). If we set > hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B > will be converted to map join but C will not be converted to map join. In my > understanding, because hive.auto.convert.join.noconditionaltask.size can only > contain E and D, so C and B should not be converted to map join. > Let's explain more why E can be converted to map join. > in current code, > [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364] > calculates all the mapjoins in the parent path and child path. The search > stops when encountering [UnionOperator or > ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381]. > Because C is not converted to map join because {{connectedMapJoinSize + > totalSize) > maxSize}} [see > code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The > RS before the join of C remains. When calculating whether B will be > converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering > [RS > |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409] > and causes {{connectedMapJoinSize + totalSize) < maxSize}} matches. > [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not as you > are more familiar with SparkJoinOptimizer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Open (was: Patch Available) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Patch Available (was: Open) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Attachment: HIVE-16966.04.patch > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
[ https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081648#comment-16081648 ] Chao Sun commented on HIVE-17018: - Thanks for the examples [~kellyzly]. Are you trying to explain that HoS is overly aggressive in turning JOINs to MAPJOINs when there're chained JOIN operators? E.g., the above {{JOIN 8}} cannot be converted. If so, I'm thinking this may be OK since the two MAPJOINs are in different work (one in Map 1 and another in Reducer 2). > Small table is converted to map join even the total size of small tables > exceeds the threshold(hive.auto.convert.join.noconditionaltask.size) > - > > Key: HIVE-17018 > URL: https://issues.apache.org/jira/browse/HIVE-17018 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt > > > we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it > means the sum of size for n-1 of the tables/partitions for a n-way join is > smaller than it, it will be converted to a map join. for example, A join B > join C join D join E. Big table is A(100M), small tables are > B(10M),C(10M),D(10M),E(10M). If we set > hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B > will be converted to map join but C will not be converted to map join. In my > understanding, because hive.auto.convert.join.noconditionaltask.size can only > contain E and D, so C and B should not be converted to map join. > Let's explain more why E can be converted to map join. > in current code, > [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364] > calculates all the mapjoins in the parent path and child path. The search > stops when encountering [UnionOperator or > ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381]. > Because C is not converted to map join because {{connectedMapJoinSize + > totalSize) > maxSize}} [see > code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The > RS before the join of C remains. When calculating whether B will be > converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering > [RS > |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409] > and causes {{connectedMapJoinSize + totalSize) < maxSize}} matches. > [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not as you > are more familiar with SparkJoinOptimizer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Patch Available (was: Open) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Open (was: Patch Available) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch, HIVE-16966.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081661#comment-16081661 ] Hive QA commented on HIVE-17066: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876541/HIVE-17066.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10834 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5949/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5949/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5949/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876541 - PreCommit-HIVE-Build > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081658#comment-16081658 ] Eugene Koifman commented on HIVE-16177: --- no related failures committed patch 18 to master (3.0) thanks Sergey, Owen for the review > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15898) add Type2 SCD merge tests
[ https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-15898: -- Attachment: HIVE-15898.06.patch > add Type2 SCD merge tests > - > > Key: HIVE-15898 > URL: https://issues.apache.org/jira/browse/HIVE-15898 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, > HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch, > HIVE-15898.06.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081621#comment-16081621 ] Hive QA commented on HIVE-16177: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876537/HIVE-16177.18.patch {color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_wise_fileformat6] (batchId=7) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5948/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5948/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5948/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876537 - PreCommit-HIVE-Build > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081614#comment-16081614 ] Eugene Koifman commented on HIVE-16732: --- failures with "error code = 10266" are expected since HIVE-16177 is not committed yet org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_orig_table] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table] (batchId=54) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=60) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_orig_table] (batchId=156) > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17010) Fix the overflow problem of Long type in SetSparkReducerParallelism
[ https://issues.apache.org/jira/browse/HIVE-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-17010: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to the upstream. Thanks Liyun for the patch and thanks Chao and Rui for the reviews. > Fix the overflow problem of Long type in SetSparkReducerParallelism > --- > > Key: HIVE-17010 > URL: https://issues.apache.org/jira/browse/HIVE-17010 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Fix For: 3.0.0 > > Attachments: HIVE-17010.1.patch, HIVE-17010.2.patch, > HIVE-17010.3.patch > > > We use > [numberOfByteshttps://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L129] > to collect the numberOfBytes of sibling of specified RS. We use Long type > and it happens overflow when the data is too big. After happening this > situation, the parallelism is decided by > [sparkMemoryAndCores.getSecond()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java#L184] > if spark.dynamic.allocation.enabled is true, sparkMemoryAndCores.getSecond > is a dymamic value which is decided by spark runtime. For example, the value > of sparkMemoryAndCores.getSecond is 5 or 15 randomly. There is possibility > that the value may be 1. The may problem here is the overflow of addition of > Long type. You can reproduce the overflow problem by following code > {code} > public static void main(String[] args) { > long a1= 9223372036854775807L; > long a2=1022672; > long res = a1+a2; > System.out.println(res); //-9223372036853753137 > BigInteger b1= BigInteger.valueOf(a1); > BigInteger b2 = BigInteger.valueOf(a2); > BigInteger bigRes = b1.add(b2); > System.out.println(bigRes); //9223372036855798479 > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17070) remove .orig files from src
[ https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081607#comment-16081607 ] Eugene Koifman commented on HIVE-17070: --- no related failures [~jdere] could you review please > remove .orig files from src > --- > > Key: HIVE-17070 > URL: https://issues.apache.org/jira/browse/HIVE-17070 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Trivial > Attachments: HIVE-17070.patch > > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig > ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081599#comment-16081599 ] Bing Li commented on HIVE-4577: --- Hi, [~vgumashta] Yes, sure. I will rebase the patch with the latest master. > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17071) Make hive 2.3 depend on storage-api-2.3
[ https://issues.apache.org/jira/browse/HIVE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081568#comment-16081568 ] Hive QA commented on HIVE-17071: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876536/HIVE-17071-branch-2.3.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5947/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5947/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5947/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-11 03:02:17.732 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5947/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z branch-2.3 ]] + [[ -d apache-github-branch-2.3-source ]] + [[ ! -d apache-github-branch-2.3-source/.git ]] + [[ ! -d apache-github-branch-2.3-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-11 03:02:17.761 + cd apache-github-branch-2.3-source + git fetch origin >From https://github.com/apache/hive 32fd02b..31cee7e branch-2.3 -> origin/branch-2.3 5431fad..6a63742 branch-2 -> origin/branch-2 + 1efb4da...61867c7 branch-2.2 -> origin/branch-2.2 (forced update) 52e0f8f..81853c1 hive-14535 -> origin/hive-14535 a18e772..7580de9 master -> origin/master e2ecc92..fea9142 storage-branch-2.3 -> origin/storage-branch-2.3 * [new tag] rel/storage-release-2.3.1 -> rel/storage-release-2.3.1 + git reset --hard HEAD HEAD is now at 32fd02b Revert "HIVE-12767: Implement table property to address Parquet int96 timestamp bug (Barna Zsombor Klara and Sergio Pena, reviewed by Ryan Blue)" + git clean -f -d Removing ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java.orig + git checkout branch-2.3 Already on 'branch-2.3' Your branch is behind 'origin/branch-2.3' by 8 commits, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/branch-2.3 HEAD is now at 31cee7e HIVE-15144: JSON.org license is now CatX (Owen O'Malley, reviewed by Alan Gates) + git merge --ff-only origin/branch-2.3 Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-11 03:02:26.431 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch Going to apply patch with: patch -p1 patching file pom.xml + [[ maven == \m\a\v\e\n ]] + rm -rf /data/hiveptest/working/maven/org/apache/hive + mvn -B clean install -DskipTests -T 4 -q -Dmaven.repo.local=/data/hiveptest/working/maven ANTLR Parser Generator Version 3.5.2 Output file /data/hiveptest/working/apache-github-branch-2.3-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java does not exist: must build /data/hiveptest/working/apache-github-branch-2.3-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g org/apache/hadoop/hive/metastore/parser/Filter.g DataNucleus Enhancer (version 4.1.17) for API "JDO" DataNucleus Enhancer : Classpath >> /usr/share/maven/boot/plexus-classworlds-2.x.jar ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MColumnDescriptor ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStringList ENHANCED (Persistable) :
[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081565#comment-16081565 ] Hive QA commented on HIVE-16732: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876526/HIVE-16732.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10835 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_orig_table] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table] (batchId=54) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=60) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_orig_table] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5946/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5946/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5946/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876526 - PreCommit-HIVE-Build > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17070) remove .orig files from src
[ https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081508#comment-16081508 ] Hive QA commented on HIVE-17070: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876523/HIVE-17070.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10834 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimal (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5945/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5945/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5945/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876523 - PreCommit-HIVE-Build > remove .orig files from src > --- > > Key: HIVE-17070 > URL: https://issues.apache.org/jira/browse/HIVE-17070 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Trivial > Attachments: HIVE-17070.patch > > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig > ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations
[ https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17067: - Attachment: HIVE-17067.2.patch Avoided running sysctl on every invocation. Requires /system?refresh=true to re-run sysctl command. Minor fixes for Mac sysctl output. > LLAP: Add http endpoint to provide system level configurations > -- > > Key: HIVE-17067 > URL: https://issues.apache.org/jira/browse/HIVE-17067 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0 > > Attachments: HIVE-17067.1.patch, HIVE-17067.2.patch > > > Add an endpoint to get kernel and network configs via sysctl. Also memory > related configs like transparent huge pages config can be added. "ulimit -a" > can be added to llap startup script as it needs a shell. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15898) add Type2 SCD merge tests
[ https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081457#comment-16081457 ] Hive QA commented on HIVE-15898: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876527/HIVE-15898.05.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10835 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_type2_scd] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning (batchId=289) org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testDelegationTokenSharedStore (batchId=229) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=220) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5944/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5944/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5944/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876527 - PreCommit-HIVE-Build > add Type2 SCD merge tests > - > > Key: HIVE-15898 > URL: https://issues.apache.org/jira/browse/HIVE-15898 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, > HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081444#comment-16081444 ] Vineet Garg commented on HIVE-17066: [~ashutoshc] Uploaded patch with updated golden files. Review board link is [RB LINK | https://reviews.apache.org/r/60757/] > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification
[ https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081441#comment-16081441 ] Vaibhav Gumashta commented on HIVE-13989: - Thanks a lot [~cdrome] > Extended ACLs are not handled according to specification > > > Key: HIVE-13989 > URL: https://issues.apache.org/jira/browse/HIVE-13989 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, > HIVE-13989-branch-1.patch > > > Hive takes two approaches to working with extended ACLs depending on whether > data is being produced via a Hive query or HCatalog APIs. A Hive query will > run an FsShell command to recursively set the extended ACLs for a directory > sub-tree. HCatalog APIs will attempt to build up the directory sub-tree > programmatically and runs some code to set the ACLs to match the parent > directory. > Some incorrect assumptions were made when implementing the extended ACLs > support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the > design documents of extended ACLs in HDFS. These documents model the > implementation after the POSIX implementation on Linux, which can be found at > http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html. > The code for setting extended ACLs via HCatalog APIs is found in > HdfsUtils.java: > {code} > if (aclEnabled) { > aclStatus = sourceStatus.getAclStatus(); > if (aclStatus != null) { > LOG.trace(aclStatus.toString()); > aclEntries = aclStatus.getEntries(); > removeBaseAclEntries(aclEntries); > //the ACL api's also expect the tradition user/group/other permission > in the form of ACL > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, > sourcePerm.getUserAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, > sourcePerm.getGroupAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, > sourcePerm.getOtherAction())); > } > } > {code} > We found that DEFAULT extended ACL rules were not being inherited properly by > the directory sub-tree, so the above code is incomplete because it > effectively drops the DEFAULT rules. The second problem is with the call to > {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended > ACLs. When extended ACLs are used the GROUP permission is replaced with the > extended ACL mask. So the above code will apply the wrong permissions to the > GROUP. Instead the correct GROUP permissions now need to be pulled from the > AclEntry as returned by {{getAclStatus().getEntries()}}. See the > implementation of the new method {{getDefaultAclEntries}} for details. > Similar issues exist with the HCatalog API. None of the API accounts for > setting extended ACLs on the directory sub-tree. The changes to the HCatalog > API allow the extended ACLs to be passed into the required methods similar to > how basic permissions are passed in. When building the directory sub-tree the > extended ACLs of the table directory are inherited by all sub-directories, > including the DEFAULT rules. > Replicating the problem: > Create a table to write data into (I will use acl_test as the destination and > words_text as the source) and set the ACLs as follows: > {noformat} > $ hdfs dfs -setfacl -m > default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx > /user/cdrome/hive/acl_test > $ hdfs dfs -ls -d /user/cdrome/hive/acl_test > drwxrwx---+ - cdrome hdfs 0 2016-07-13 20:36 > /user/cdrome/hive/acl_test > $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test > # file: /user/cdrome/hive/acl_test > # owner: cdrome > # group: hdfs > user::rwx > user:hdfs:rwx > group::r-x > mask::rwx > other::--- > default:user::rwx > default:user:hdfs:rwx > default:group::r-x > default:mask::rwx > default:other::--- > {noformat} > Note that the basic GROUP permission is set to {{rwx}} after setting the > ACLs. The ACLs explicitly set the DEFAULT rules and a rule specifically for > the {{hdfs}} user. > Run the following query to populate the table: > {noformat} > insert into acl_test partition (dt='a', ds='b') select a, b from words_text > where dt = 'c'; > {noformat} > Note that words_text only has a single partition key. > Now examine the ACLs for the resulting directories: > {noformat} > $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test > # file: /user/cdrome/hive/acl_test > # owner: cdrome > # group: hdfs > user::rwx > user:hdfs:rwx > group::r-x > mask::rwx > other::--- > default:user::rwx > default:user:hdfs:rwx > default:group::r-x >
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Patch Available (was: Open) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Open (was: Patch Available) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Attachment: HIVE-17066.2.patch > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification
[ https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081414#comment-16081414 ] Chris Drome commented on HIVE-13989: [~vgumashta], yes, I will come back to this and verify whether there are still issues in trunk (this patch was originally written against 1.2). > Extended ACLs are not handled according to specification > > > Key: HIVE-13989 > URL: https://issues.apache.org/jira/browse/HIVE-13989 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, > HIVE-13989-branch-1.patch > > > Hive takes two approaches to working with extended ACLs depending on whether > data is being produced via a Hive query or HCatalog APIs. A Hive query will > run an FsShell command to recursively set the extended ACLs for a directory > sub-tree. HCatalog APIs will attempt to build up the directory sub-tree > programmatically and runs some code to set the ACLs to match the parent > directory. > Some incorrect assumptions were made when implementing the extended ACLs > support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the > design documents of extended ACLs in HDFS. These documents model the > implementation after the POSIX implementation on Linux, which can be found at > http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html. > The code for setting extended ACLs via HCatalog APIs is found in > HdfsUtils.java: > {code} > if (aclEnabled) { > aclStatus = sourceStatus.getAclStatus(); > if (aclStatus != null) { > LOG.trace(aclStatus.toString()); > aclEntries = aclStatus.getEntries(); > removeBaseAclEntries(aclEntries); > //the ACL api's also expect the tradition user/group/other permission > in the form of ACL > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, > sourcePerm.getUserAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, > sourcePerm.getGroupAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, > sourcePerm.getOtherAction())); > } > } > {code} > We found that DEFAULT extended ACL rules were not being inherited properly by > the directory sub-tree, so the above code is incomplete because it > effectively drops the DEFAULT rules. The second problem is with the call to > {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended > ACLs. When extended ACLs are used the GROUP permission is replaced with the > extended ACL mask. So the above code will apply the wrong permissions to the > GROUP. Instead the correct GROUP permissions now need to be pulled from the > AclEntry as returned by {{getAclStatus().getEntries()}}. See the > implementation of the new method {{getDefaultAclEntries}} for details. > Similar issues exist with the HCatalog API. None of the API accounts for > setting extended ACLs on the directory sub-tree. The changes to the HCatalog > API allow the extended ACLs to be passed into the required methods similar to > how basic permissions are passed in. When building the directory sub-tree the > extended ACLs of the table directory are inherited by all sub-directories, > including the DEFAULT rules. > Replicating the problem: > Create a table to write data into (I will use acl_test as the destination and > words_text as the source) and set the ACLs as follows: > {noformat} > $ hdfs dfs -setfacl -m > default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx > /user/cdrome/hive/acl_test > $ hdfs dfs -ls -d /user/cdrome/hive/acl_test > drwxrwx---+ - cdrome hdfs 0 2016-07-13 20:36 > /user/cdrome/hive/acl_test > $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test > # file: /user/cdrome/hive/acl_test > # owner: cdrome > # group: hdfs > user::rwx > user:hdfs:rwx > group::r-x > mask::rwx > other::--- > default:user::rwx > default:user:hdfs:rwx > default:group::r-x > default:mask::rwx > default:other::--- > {noformat} > Note that the basic GROUP permission is set to {{rwx}} after setting the > ACLs. The ACLs explicitly set the DEFAULT rules and a rule specifically for > the {{hdfs}} user. > Run the following query to populate the table: > {noformat} > insert into acl_test partition (dt='a', ds='b') select a, b from words_text > where dt = 'c'; > {noformat} > Note that words_text only has a single partition key. > Now examine the ACLs for the resulting directories: > {noformat} > $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test > # file: /user/cdrome/hive/acl_test > # owner: cdrome > # group: hdfs > user::rwx > user:hdfs:rwx >
[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16973: --- Target Version/s: 3.0.0 (was: 2.3.0) > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Reporter: Josh Elser >Assignee: Josh Elser > Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, > HIVE-16973.003.branch-2.patch, HIVE-16973.004-branch-2.patch, > HIVE-16973.004.branch-2.patch > > > Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. > Looking into it, it seems like the bit-rot got pretty bad. You'll see > something like the following: > {noformat} > Caused by: java.io.IOException: Failed to unwrap AuthenticationToken > at > org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) > > at > org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) > > {noformat} > It appears that some of the code-paths changed since when I first did my > testing (or I just did poor testing) and the delegation token was never being > fetched/serialized. There also are some issues with fetching the delegation > token from Accumulo properly which were addressed in ACCUMULO-4665 > I believe it would also be best to just update the dependency to use Accumulo > 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would > otherwise get much more complicated with reflection -- Accumulo has moved on > past 1.6, so let's do the same in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081405#comment-16081405 ] Pengcheng Xiong commented on HIVE-16973: Hello, I am deferring this to Hive 3.0 as we are going to cut the next RC and it is not marked as blocker. Please feel free to commit to the branch if this can be resolved before the release.Thanks! > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Reporter: Josh Elser >Assignee: Josh Elser > Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, > HIVE-16973.003.branch-2.patch, HIVE-16973.004-branch-2.patch, > HIVE-16973.004.branch-2.patch > > > Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. > Looking into it, it seems like the bit-rot got pretty bad. You'll see > something like the following: > {noformat} > Caused by: java.io.IOException: Failed to unwrap AuthenticationToken > at > org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) > > at > org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) > > {noformat} > It appears that some of the code-paths changed since when I first did my > testing (or I just did poor testing) and the delegation token was never being > fetched/serialized. There also are some issues with fetching the delegation > token from Accumulo properly which were addressed in ACCUMULO-4665 > I believe it would also be best to just update the dependency to use Accumulo > 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would > otherwise get much more complicated with reflection -- Accumulo has moved on > past 1.6, so let's do the same in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081400#comment-16081400 ] Hive QA commented on HIVE-16177: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876521/HIVE-16177.17.patch {color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.ql.TestTxnCommands.testNonAcidToAcidConversion01 (batchId=282) org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion02 (batchId=269) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testNonAcidToAcidConversion02 (batchId=280) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion02 (batchId=277) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5943/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5943/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5943/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876521 - PreCommit-HIVE-Build > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081390#comment-16081390 ] Eugene Koifman commented on HIVE-16177: --- patch 18 (vs 16) adds a bunch of clarifying comments and contains very minor code changes: 1. creates a Comparator in AcidUtils to sort "original" files per Owen's suggestion and makes "isLastFileForThisBucket" in OrcRawRecordMerger.OriginalReaderPair() make more sense. > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10
[ https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081249#comment-16081249 ] Ashutosh Chauhan commented on HIVE-16888: - [~bslim] Can you please also review plan changes for druid test cases as I am not sure about some of them. > Upgrade Calcite to 1.13 and Avatica to 1.10 > --- > > Key: HIVE-16888 > URL: https://issues.apache.org/jira/browse/HIVE-16888 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 3.0.0 >Reporter: Remus Rusanu >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, > HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch, > HIVE-16888.06.patch, HIVE-16888.07.patch, HIVE-16888.08.patch > > > I'm creating this early to be able to ptest the current Calcite > 1.13.0-SNAPSHOT -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17071) Make hive 2.3 depend on storage-api-2.3
[ https://issues.apache.org/jira/browse/HIVE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-17071: --- Attachment: HIVE-17071-branch-2.3.patch > Make hive 2.3 depend on storage-api-2.3 > --- > > Key: HIVE-17071 > URL: https://issues.apache.org/jira/browse/HIVE-17071 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong > Fix For: 2.3.0 > > Attachments: HIVE-17071-branch-2.3.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17071) Make hive 2.3 depend on storage-api-2.3
[ https://issues.apache.org/jira/browse/HIVE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reassigned HIVE-17071: -- Assignee: Pengcheng Xiong > Make hive 2.3 depend on storage-api-2.3 > --- > > Key: HIVE-17071 > URL: https://issues.apache.org/jira/browse/HIVE-17071 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.3.0 > > Attachments: HIVE-17071-branch-2.3.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16177: -- Attachment: HIVE-16177.18.patch > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17071) Make hive 2.3 depend on storage-api-2.3
[ https://issues.apache.org/jira/browse/HIVE-17071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-17071: --- Status: Patch Available (was: Open) > Make hive 2.3 depend on storage-api-2.3 > --- > > Key: HIVE-17071 > URL: https://issues.apache.org/jira/browse/HIVE-17071 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong > Fix For: 2.3.0 > > Attachments: HIVE-17071-branch-2.3.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081366#comment-16081366 ] Ashutosh Chauhan commented on HIVE-17066: - Can you update golden files and create a RB for this? > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081337#comment-16081337 ] Hive QA commented on HIVE-16996: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876501/HIVE-16966.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 313 failed/errored test(s), 10245 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_partition_update_status] (batchId=84) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_column_stats] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[analyze_tbl_part] (batchId=46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_deep_filters] (batchId=85) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_filter] (batchId=8) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby2] (batchId=45) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_groupby] (batchId=46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join_pkfk] (batchId=14) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_limit] (batchId=11) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_part] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_select] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_table] (batchId=20) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_union] (batchId=46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_5] (batchId=40) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_6] (batchId=62) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_7] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] (batchId=13) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join12] (batchId=23) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join13] (batchId=76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal] (batchId=65) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_decimal_native] (batchId=26) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_const] (batchId=17) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_annotate_stats_groupby] (batchId=80) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join0] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_join0] (batchId=47) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[colstats_all_nulls] (batchId=6) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[column_names_with_leading_and_trailing_spaces] (batchId=22) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[column_pruner_multiple_children] (batchId=21) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_partlvl] (batchId=34) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_partlvl_dp] (batchId=49) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_quoting] (batchId=84) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_tbllvl] (batchId=8) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_date] (batchId=42) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[confirm_initial_tbl_stats] (batchId=29) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_2] (batchId=27) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlated_join_keys] (batchId=26) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_stats] (batchId=79) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[describe_table] (batchId=41) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[display_colstats_tbllvl] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exec_parallel_column_stats] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_full] (batchId=33) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_partial] (batchId=46)
[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification
[ https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081335#comment-16081335 ] Vaibhav Gumashta commented on HIVE-13989: - [~cdrome] Thanks for the work so far. Looks like a bug we should definitely merge into master. Will you have time to address [~caritaou]'s review comments? > Extended ACLs are not handled according to specification > > > Key: HIVE-13989 > URL: https://issues.apache.org/jira/browse/HIVE-13989 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, > HIVE-13989-branch-1.patch > > > Hive takes two approaches to working with extended ACLs depending on whether > data is being produced via a Hive query or HCatalog APIs. A Hive query will > run an FsShell command to recursively set the extended ACLs for a directory > sub-tree. HCatalog APIs will attempt to build up the directory sub-tree > programmatically and runs some code to set the ACLs to match the parent > directory. > Some incorrect assumptions were made when implementing the extended ACLs > support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the > design documents of extended ACLs in HDFS. These documents model the > implementation after the POSIX implementation on Linux, which can be found at > http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html. > The code for setting extended ACLs via HCatalog APIs is found in > HdfsUtils.java: > {code} > if (aclEnabled) { > aclStatus = sourceStatus.getAclStatus(); > if (aclStatus != null) { > LOG.trace(aclStatus.toString()); > aclEntries = aclStatus.getEntries(); > removeBaseAclEntries(aclEntries); > //the ACL api's also expect the tradition user/group/other permission > in the form of ACL > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, > sourcePerm.getUserAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, > sourcePerm.getGroupAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, > sourcePerm.getOtherAction())); > } > } > {code} > We found that DEFAULT extended ACL rules were not being inherited properly by > the directory sub-tree, so the above code is incomplete because it > effectively drops the DEFAULT rules. The second problem is with the call to > {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended > ACLs. When extended ACLs are used the GROUP permission is replaced with the > extended ACL mask. So the above code will apply the wrong permissions to the > GROUP. Instead the correct GROUP permissions now need to be pulled from the > AclEntry as returned by {{getAclStatus().getEntries()}}. See the > implementation of the new method {{getDefaultAclEntries}} for details. > Similar issues exist with the HCatalog API. None of the API accounts for > setting extended ACLs on the directory sub-tree. The changes to the HCatalog > API allow the extended ACLs to be passed into the required methods similar to > how basic permissions are passed in. When building the directory sub-tree the > extended ACLs of the table directory are inherited by all sub-directories, > including the DEFAULT rules. > Replicating the problem: > Create a table to write data into (I will use acl_test as the destination and > words_text as the source) and set the ACLs as follows: > {noformat} > $ hdfs dfs -setfacl -m > default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx > /user/cdrome/hive/acl_test > $ hdfs dfs -ls -d /user/cdrome/hive/acl_test > drwxrwx---+ - cdrome hdfs 0 2016-07-13 20:36 > /user/cdrome/hive/acl_test > $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test > # file: /user/cdrome/hive/acl_test > # owner: cdrome > # group: hdfs > user::rwx > user:hdfs:rwx > group::r-x > mask::rwx > other::--- > default:user::rwx > default:user:hdfs:rwx > default:group::r-x > default:mask::rwx > default:other::--- > {noformat} > Note that the basic GROUP permission is set to {{rwx}} after setting the > ACLs. The ACLs explicitly set the DEFAULT rules and a rule specifically for > the {{hdfs}} user. > Run the following query to populate the table: > {noformat} > insert into acl_test partition (dt='a', ds='b') select a, b from words_text > where dt = 'c'; > {noformat} > Note that words_text only has a single partition key. > Now examine the ACLs for the resulting directories: > {noformat} > $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test > # file: /user/cdrome/hive/acl_test > # owner: cdrome > # group: hdfs > user::rwx >
[jira] [Updated] (HIVE-17070) remove .orig files from src
[ https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17070: -- Status: Patch Available (was: Open) > remove .orig files from src > --- > > Key: HIVE-17070 > URL: https://issues.apache.org/jira/browse/HIVE-17070 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Trivial > Attachments: HIVE-17070.patch > > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig > ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15144) JSON.org license is now CatX
[ https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081312#comment-16081312 ] Pengcheng Xiong commented on HIVE-15144: pushed to master. and cherry-picked to 2.3 > JSON.org license is now CatX > > > Key: HIVE-15144 > URL: https://issues.apache.org/jira/browse/HIVE-15144 > Project: Hive > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 2.2.0 > > Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch, > HIVE-15144.patch > > > per [update resolved legal|http://www.apache.org/legal/resolved.html#json]: > {quote} > CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE? > No. As of 2016-11-03 this has been moved to the 'Category X' license list. > Prior to this, use of the JSON Java library was allowed. See Debian's page > for a list of alternatives. > {quote} > I'm not sure when this dependency was first introduced, but it looks like > it's currently used in a few places: > https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15898) add Type2 SCD merge tests
[ https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-15898: -- Attachment: HIVE-15898.05.patch > add Type2 SCD merge tests > - > > Key: HIVE-15898 > URL: https://issues.apache.org/jira/browse/HIVE-15898 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, > HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types
[ https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081330#comment-16081330 ] Matt McCline commented on HIVE-16730: - +1 LGTM > Vectorization: Schema Evolution for Text Vectorization / Complex Types > -- > > Key: HIVE-16730 > URL: https://issues.apache.org/jira/browse/HIVE-16730 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch, > HIVE-16730.3.patch > > > With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes > PARTIAL2, FINAL, and COMPLETE for AVG" change, the tests > schema_evol_text_vec_part_all_complex.q and > schema_evol_text_vecrow_part_all_complex.q fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16732: -- Attachment: HIVE-16732.02.patch > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17070) remove .orig files from src
[ https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-17070: - > remove .orig files from src > --- > > Key: HIVE-17070 > URL: https://issues.apache.org/jira/browse/HIVE-17070 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Trivial > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig > ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations
[ https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081294#comment-16081294 ] Prasanth Jayachandran commented on HIVE-17067: -- I will cache the results from sysctl and add an option to force read (refresh) in case if required. > LLAP: Add http endpoint to provide system level configurations > -- > > Key: HIVE-17067 > URL: https://issues.apache.org/jira/browse/HIVE-17067 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0 > > Attachments: HIVE-17067.1.patch > > > Add an endpoint to get kernel and network configs via sysctl. Also memory > related configs like transparent huge pages config can be added. "ulimit -a" > can be added to llap startup script as it needs a shell. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16177: -- Attachment: HIVE-16177.17.patch > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16177: -- Attachment: HIVE-16177.17.patch > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16177: -- Attachment: (was: HIVE-16177.17.patch) > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17070) remove .orig files from src
[ https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17070: -- Attachment: HIVE-17070.patch > remove .orig files from src > --- > > Key: HIVE-17070 > URL: https://issues.apache.org/jira/browse/HIVE-17070 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Trivial > Attachments: HIVE-17070.patch > > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig > ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations
[ https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081280#comment-16081280 ] Gopal V commented on HIVE-17067: I don't have a specific security concern for the information in sysctl (maybe kernel version). My concern is the fork + exec operation that happens here (with YARN memory monitoring) - each web hit to the end point triggering a fork + exec, is somewhat of a noisy operation. Most of this does not change every second (unlike say SNMP counters). > LLAP: Add http endpoint to provide system level configurations > -- > > Key: HIVE-17067 > URL: https://issues.apache.org/jira/browse/HIVE-17067 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0 > > Attachments: HIVE-17067.1.patch > > > Add an endpoint to get kernel and network configs via sysctl. Also memory > related configs like transparent huge pages config can be added. "ulimit -a" > can be added to llap startup script as it needs a shell. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types
[ https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081273#comment-16081273 ] Hive QA commented on HIVE-16730: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876497/HIVE-16730.3.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=109) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5940/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5940/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5940/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876497 - PreCommit-HIVE-Build > Vectorization: Schema Evolution for Text Vectorization / Complex Types > -- > > Key: HIVE-16730 > URL: https://issues.apache.org/jira/browse/HIVE-16730 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch, > HIVE-16730.3.patch > > > With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes > PARTIAL2, FINAL, and COMPLETE for AVG" change, the tests > schema_evol_text_vec_part_all_complex.q and > schema_evol_text_vecrow_part_all_complex.q fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17068) HCatalog: Add parquet support
[ https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081245#comment-16081245 ] Prasanth Jayachandran commented on HIVE-17068: -- [~masokan] Thanks for the pointer. Yes. Both looks the same. I will mark this jira as duplicate. > HCatalog: Add parquet support > - > > Key: HIVE-17068 > URL: https://issues.apache.org/jira/browse/HIVE-17068 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17068.1.patch, HIVE-17068.2.patch > > > MapredParquetOutputFormat has to support getRecordWriter() for parquet format > to be used from HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17068) HCatalog: Add parquet support
[ https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17068: - Resolution: Duplicate Status: Resolved (was: Patch Available) Dup of HIVE-8838 > HCatalog: Add parquet support > - > > Key: HIVE-17068 > URL: https://issues.apache.org/jira/browse/HIVE-17068 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17068.1.patch, HIVE-17068.2.patch > > > MapredParquetOutputFormat has to support getRecordWriter() for parquet format > to be used from HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16911) Upgrade groovy version to 2.4.11
[ https://issues.apache.org/jira/browse/HIVE-16911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16911: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks Yongzhi for reviewing. > Upgrade groovy version to 2.4.11 > > > Key: HIVE-16911 > URL: https://issues.apache.org/jira/browse/HIVE-16911 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Fix For: 3.0.0 > > Attachments: HIVE-16911.1.patch > > > Hive currently uses groovy 2.4.4 which has security issue > (https://access.redhat.com/security/cve/cve-2016-6814). Need to upgrade to > 2.4.8 or later. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17068) HCatalog: Add parquet support
[ https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081238#comment-16081238 ] Mariappan Asokan commented on HIVE-17068: - Hi Prasanth, I am wondering whether HIVE-8838 is related to this Jira. Thanks. > HCatalog: Add parquet support > - > > Key: HIVE-17068 > URL: https://issues.apache.org/jira/browse/HIVE-17068 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17068.1.patch, HIVE-17068.2.patch > > > MapredParquetOutputFormat has to support getRecordWriter() for parquet format > to be used from HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17068) HCatalog: Add parquet support
[ https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17068: - Attachment: HIVE-17068.2.patch Enable Parquet unit tests. > HCatalog: Add parquet support > - > > Key: HIVE-17068 > URL: https://issues.apache.org/jira/browse/HIVE-17068 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17068.1.patch, HIVE-17068.2.patch > > > MapredParquetOutputFormat has to support getRecordWriter() for parquet format > to be used from HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Attachment: HIVE-16966.03.patch > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Patch Available (was: Open) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16996) Add HLL as an alternative to FM sketch to compute stats
[ https://issues.apache.org/jira/browse/HIVE-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16996: --- Status: Open (was: Patch Available) > Add HLL as an alternative to FM sketch to compute stats > --- > > Key: HIVE-16996 > URL: https://issues.apache.org/jira/browse/HIVE-16996 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: Accuracy and performance comparison between HyperLogLog > and FM Sketch.docx, HIVE-16966.01.patch, HIVE-16966.02.patch, > HIVE-16966.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair
[ https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-17069: - > Refactor OrcRawRecrodMerger.ReaderPair > -- > > Key: HIVE-17069 > URL: https://issues.apache.org/jira/browse/HIVE-17069 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > this should be done post HIVE-16177 so as not to obscure the functional > changes completely > Make ReaderPair an interface > ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" > code path > OriginalReaderPair - same as now but w/o incomprehensible override/variable > shadowing logic. > Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common > base class. > Push discoverKeyBounds() into appropriate implementation -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types
[ https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-16730: -- Attachment: HIVE-16730.3.patch This third patch fixes a bug in LazySimpleDeserializeRead. It tried to parse the last column in a struct column with its data and following additional data, too. This patch makes LazySimpleDeserializeRead read the last column in a struct column with its column data only. > Vectorization: Schema Evolution for Text Vectorization / Complex Types > -- > > Key: HIVE-16730 > URL: https://issues.apache.org/jira/browse/HIVE-16730 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch, > HIVE-16730.3.patch > > > With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes > PARTIAL2, FINAL, and COMPLETE for AVG" change, the tests > schema_evol_text_vec_part_all_complex.q and > schema_evol_text_vecrow_part_all_complex.q fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations
[ https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081161#comment-16081161 ] Prasanth Jayachandran commented on HIVE-17067: -- Some configs already throw Permission Denied for non-root users. This exposes what non-root user can read via sysctl -a although no check for root access is done. Also we don't support POST/update to configs via endpoint. [~gopalv] any thoughts for security here? > LLAP: Add http endpoint to provide system level configurations > -- > > Key: HIVE-17067 > URL: https://issues.apache.org/jira/browse/HIVE-17067 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0 > > Attachments: HIVE-17067.1.patch > > > Add an endpoint to get kernel and network configs via sysctl. Also memory > related configs like transparent huge pages config can be added. "ulimit -a" > can be added to llap startup script as it needs a shell. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17067) LLAP: Add http endpoint to provide system level configurations
[ https://issues.apache.org/jira/browse/HIVE-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081123#comment-16081123 ] Siddharth Seth commented on HIVE-17067: --- +1. Looks good. Does anything specific need to be looked at in terms of security? > LLAP: Add http endpoint to provide system level configurations > -- > > Key: HIVE-17067 > URL: https://issues.apache.org/jira/browse/HIVE-17067 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0 > > Attachments: HIVE-17067.1.patch > > > Add an endpoint to get kernel and network configs via sysctl. Also memory > related configs like transparent huge pages config can be added. "ulimit -a" > can be added to llap startup script as it needs a shell. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081094#comment-16081094 ] Hive QA commented on HIVE-16973: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876479/HIVE-16973.004-branch-2.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10582 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228) org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5939/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5939/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5939/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876479 - PreCommit-HIVE-Build > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Reporter: Josh Elser >Assignee: Josh Elser > Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, > HIVE-16973.003.branch-2.patch, HIVE-16973.004-branch-2.patch, > HIVE-16973.004.branch-2.patch > > > Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. > Looking into it, it seems like the bit-rot got pretty bad. You'll see > something like the following: > {noformat} > Caused by: java.io.IOException: Failed to unwrap AuthenticationToken > at > org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) > > at > org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) > > {noformat} > It appears that some of the code-paths changed since when I first did my > testing (or I just did poor testing) and the delegation token was never being > fetched/serialized. There also are some issues with fetching the delegation > token from Accumulo properly which were addressed in ACCUMULO-4665 > I believe it would also be best to just update the dependency to use Accumulo > 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would > otherwise get much more complicated with reflection -- Accumulo has moved on > past 1.6, so let's do the same in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16690) Configure Tez cartesian product edge based on LLAP cluster size
[ https://issues.apache.org/jira/browse/HIVE-16690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhiyuan Yang updated HIVE-16690: Attachment: HIVE-16690.addendum.patch Upload addendum patch which avoids accessing uninitialized Llap cluster info (which cause NPE). > Configure Tez cartesian product edge based on LLAP cluster size > --- > > Key: HIVE-16690 > URL: https://issues.apache.org/jira/browse/HIVE-16690 > Project: Hive > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Zhiyuan Yang > Attachments: HIVE-16690.1.patch, HIVE-16690.addendum.patch > > > In HIVE-14731 we are using default value for target parallelism of fair > cartesian product edge. Ideally this should be set according to cluster size. > In case of LLAP it's pretty easy to get cluster size, i.e., number of > executors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080902#comment-16080902 ] Hive QA commented on HIVE-16973: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876465/HIVE-16973.004.branch-2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5938/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5938/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5938/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-10 19:09:02.704 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5938/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-10 19:09:02.709 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-10 19:09:07.749 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java:52 error: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java: patch does not apply error: patch failed: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java:125 error: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12876465 - PreCommit-HIVE-Build > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Reporter: Josh Elser >Assignee: Josh Elser > Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, > HIVE-16973.003.branch-2.patch, HIVE-16973.004.branch-2.patch > > > Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. > Looking into it, it seems like the bit-rot got pretty bad. You'll see > something like the following: > {noformat} > Caused by: java.io.IOException: Failed to unwrap AuthenticationToken > at > org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) > > at > org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) > > {noformat} > It appears that some of the code-paths changed since when I first did my > testing (or I just did poor testing) and the delegation token was never being > fetched/serialized. There also are some issues with fetching the delegation > token from Accumulo properly which were addressed in ACCUMULO-4665 > I believe it would also be best to just update the
[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-16973: -- Attachment: HIVE-16973.004-branch-2.patch > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Reporter: Josh Elser >Assignee: Josh Elser > Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, > HIVE-16973.003.branch-2.patch, HIVE-16973.004-branch-2.patch, > HIVE-16973.004.branch-2.patch > > > Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. > Looking into it, it seems like the bit-rot got pretty bad. You'll see > something like the following: > {noformat} > Caused by: java.io.IOException: Failed to unwrap AuthenticationToken > at > org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) > > at > org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) > > {noformat} > It appears that some of the code-paths changed since when I first did my > testing (or I just did poor testing) and the delegation token was never being > fetched/serialized. There also are some issues with fetching the delegation > token from Accumulo properly which were addressed in ACCUMULO-4665 > I believe it would also be best to just update the dependency to use Accumulo > 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would > otherwise get much more complicated with reflection -- Accumulo has moved on > past 1.6, so let's do the same in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17068) HCatalog: Add parquet support
[ https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080892#comment-16080892 ] Hive QA commented on HIVE-17068: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876460/HIVE-17068.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10834 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5937/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5937/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5937/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876460 - PreCommit-HIVE-Build > HCatalog: Add parquet support > - > > Key: HIVE-17068 > URL: https://issues.apache.org/jira/browse/HIVE-17068 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17068.1.patch > > > MapredParquetOutputFormat has to support getRecordWriter() for parquet format > to be used from HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-16973: -- Attachment: HIVE-16973.004.branch-2.patch > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Reporter: Josh Elser >Assignee: Josh Elser > Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, > HIVE-16973.003.branch-2.patch, HIVE-16973.004.branch-2.patch > > > Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. > Looking into it, it seems like the bit-rot got pretty bad. You'll see > something like the following: > {noformat} > Caused by: java.io.IOException: Failed to unwrap AuthenticationToken > at > org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) > > at > org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) > > {noformat} > It appears that some of the code-paths changed since when I first did my > testing (or I just did poor testing) and the delegation token was never being > fetched/serialized. There also are some issues with fetching the delegation > token from Accumulo properly which were addressed in ACCUMULO-4665 > I believe it would also be best to just update the dependency to use Accumulo > 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would > otherwise get much more complicated with reflection -- Accumulo has moved on > past 1.6, so let's do the same in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16688) Make sure Alter Table to set transaction=true acquires X lock
[ https://issues.apache.org/jira/browse/HIVE-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16688: -- Priority: Critical (was: Major) > Make sure Alter Table to set transaction=true acquires X lock > - > > Key: HIVE-16688 > URL: https://issues.apache.org/jira/browse/HIVE-16688 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > > suppose we have non-acid table with some data > An insert op starts (long running) > An alter table runs to add (transactional=true) > An update is run which will read the list of "original" files and assign IDs > on the fly which are written to a delta file. > The long running insert completes. > Another update is run which now sees a different set of "original" files and > will (most likely) assign different IDs. > Need to make sure to mutex this -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080778#comment-16080778 ] Hive QA commented on HIVE-4577: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12728375/HIVE-4577.4.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5936/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5936/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5936/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-10 18:08:49.193 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5936/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-10 18:08:49.197 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-10 18:08:49.895 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java: No such file or directory The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12728375 - PreCommit-HIVE-Build > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17068) HCatalog: Add parquet support
[ https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080771#comment-16080771 ] Prasanth Jayachandran commented on HIVE-17068: -- [~sushanth] can you please review this patch? > HCatalog: Add parquet support > - > > Key: HIVE-17068 > URL: https://issues.apache.org/jira/browse/HIVE-17068 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17068.1.patch > > > MapredParquetOutputFormat has to support getRecordWriter() for parquet format > to be used from HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17068) HCatalog: Add parquet support
[ https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17068: - Status: Patch Available (was: Open) > HCatalog: Add parquet support > - > > Key: HIVE-17068 > URL: https://issues.apache.org/jira/browse/HIVE-17068 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17068.1.patch > > > MapredParquetOutputFormat has to support getRecordWriter() for parquet format > to be used from HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17068) HCatalog: Add parquet support
[ https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17068: - Attachment: HIVE-17068.1.patch > HCatalog: Add parquet support > - > > Key: HIVE-17068 > URL: https://issues.apache.org/jira/browse/HIVE-17068 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17068.1.patch > > > MapredParquetOutputFormat has to support getRecordWriter() for parquet format > to be used from HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080758#comment-16080758 ] Vaibhav Gumashta commented on HIVE-4577: [~libing] thanks a lot for the patch and apologies that this went out of sight. Would you like to rebase it one more time for master? I am +1 on the changes. > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-16973: -- Attachment: HIVE-16973.003.branch-2.patch > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Reporter: Josh Elser >Assignee: Josh Elser > Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, > HIVE-16973.003.branch-2.patch > > > Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. > Looking into it, it seems like the bit-rot got pretty bad. You'll see > something like the following: > {noformat} > Caused by: java.io.IOException: Failed to unwrap AuthenticationToken > at > org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) > > at > org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) > > {noformat} > It appears that some of the code-paths changed since when I first did my > testing (or I just did poor testing) and the delegation token was never being > fetched/serialized. There also are some issues with fetching the delegation > token from Accumulo properly which were addressed in ACCUMULO-4665 > I believe it would also be best to just update the dependency to use Accumulo > 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would > otherwise get much more complicated with reflection -- Accumulo has moved on > past 1.6, so let's do the same in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080734#comment-16080734 ] Josh Elser commented on HIVE-16973: --- bq. Spinning up my local install to verify the Kerberos portion wasn't affected. Local tests with Kerberos show this is fine as well. I'll need to spend the time to add a qtest that does Accumulo with Kerberos to try to prevent some regressions (this will take a day or two though). v3 patch is the same as v2 but was just a normal `git diff` patch instead of a formatted for email (e.g. git-format-patch). > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Reporter: Josh Elser >Assignee: Josh Elser > Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch, > HIVE-16973.003.branch-2.patch > > > Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. > Looking into it, it seems like the bit-rot got pretty bad. You'll see > something like the following: > {noformat} > Caused by: java.io.IOException: Failed to unwrap AuthenticationToken > at > org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) > > at > org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) > > {noformat} > It appears that some of the code-paths changed since when I first did my > testing (or I just did poor testing) and the delegation token was never being > fetched/serialized. There also are some issues with fetching the delegation > token from Accumulo properly which were addressed in ACCUMULO-4665 > I believe it would also be best to just update the dependency to use Accumulo > 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would > otherwise get much more complicated with reflection -- Accumulo has moved on > past 1.6, so let's do the same in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080733#comment-16080733 ] Hive QA commented on HIVE-16973: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876456/HIVE-16973.003.branch-2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5935/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5935/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5935/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-10 17:48:43.713 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5935/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-10 17:48:43.716 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-10 17:48:44.321 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloConnectionParameters.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/HiveAccumuloHelper.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableInputFormat.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/CompositeAccumuloRowIdFactory.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/DefaultAccumuloRowIdFactory.java: No such file or directory error: a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloStorageHandler.java: No such file or directory error: a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestHiveAccumuloHelper.java: No such file or directory error: a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableInputFormat.java: No such file or directory error: a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableOutputFormat.java: No such file or directory error: a/itests/qtest-accumulo/pom.xml: No such file or directory error: a/pom.xml: No such file or directory The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12876456 - PreCommit-HIVE-Build > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive >
[jira] [Assigned] (HIVE-17068) HCatalog: Add parquet support
[ https://issues.apache.org/jira/browse/HIVE-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-17068: > HCatalog: Add parquet support > - > > Key: HIVE-17068 > URL: https://issues.apache.org/jira/browse/HIVE-17068 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > MapredParquetOutputFormat has to support getRecordWriter() for parquet format > to be used from HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080716#comment-16080716 ] Hive QA commented on HIVE-16973: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876450/HIVE-16973.002.branch-2.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5934/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5934/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5934/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-10 17:36:50.135 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5934/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-10 17:36:50.138 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git clean -f -d + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 7f5460d HIVE-16981: hive.optimize.bucketingsorting should compare the schema before removing RS (Pengcheng Xiong, reviewed by Ashutosh Chauhan) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-10 17:36:52.908 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloConnectionParameters.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/HiveAccumuloHelper.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/Utils.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableInputFormat.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/CompositeAccumuloRowIdFactory.java: No such file or directory error: a/accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/DefaultAccumuloRowIdFactory.java: No such file or directory error: a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloStorageHandler.java: No such file or directory error: a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestHiveAccumuloHelper.java: No such file or directory error: a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableInputFormat.java: No such file or directory error: a/accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/mr/TestHiveAccumuloTableOutputFormat.java: No such file or directory error: a/itests/qtest-accumulo/pom.xml: No such file or directory error: a/pom.xml: No such file or directory The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12876450 - PreCommit-HIVE-Build > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive >
[jira] [Updated] (HIVE-16973) Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in HS2
[ https://issues.apache.org/jira/browse/HIVE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HIVE-16973: -- Attachment: HIVE-16973.002.branch-2.patch .002 Some more code consolidation/cleanup. Ran unit test and the accumulo qtests locally. Spinning up my local install to verify the Kerberos portion wasn't affected. > Fetching of Delegation tokens (Kerberos) for AccumuloStorageHandler fails in > HS2 > > > Key: HIVE-16973 > URL: https://issues.apache.org/jira/browse/HIVE-16973 > Project: Hive > Issue Type: Bug > Components: Accumulo Storage Handler >Reporter: Josh Elser >Assignee: Josh Elser > Attachments: HIVE-16973.001.patch, HIVE-16973.002.branch-2.patch > > > Had a report from a user that Kerberos+AccumuloStorageHandler+HS2 was broken. > Looking into it, it seems like the bit-rot got pretty bad. You'll see > something like the following: > {noformat} > Caused by: java.io.IOException: Failed to unwrap AuthenticationToken > at > org.apache.hadoop.hive.accumulo.HiveAccumuloHelper.unwrapAuthenticationToken(HiveAccumuloHelper.java:312) > > at > org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat.getSplits(HiveAccumuloTableInputFormat.java:122) > > {noformat} > It appears that some of the code-paths changed since when I first did my > testing (or I just did poor testing) and the delegation token was never being > fetched/serialized. There also are some issues with fetching the delegation > token from Accumulo properly which were addressed in ACCUMULO-4665 > I believe it would also be best to just update the dependency to use Accumulo > 1.7 (drop 1.6 support) as it's lacking in this regard. These changes would > otherwise get much more complicated with reflection -- Accumulo has moved on > past 1.6, so let's do the same in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types
[ https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080711#comment-16080711 ] Hive QA commented on HIVE-16730: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876439/HIVE-16730.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10820 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=100) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5933/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5933/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5933/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876439 - PreCommit-HIVE-Build > Vectorization: Schema Evolution for Text Vectorization / Complex Types > -- > > Key: HIVE-16730 > URL: https://issues.apache.org/jira/browse/HIVE-16730 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch > > > With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes > PARTIAL2, FINAL, and COMPLETE for AVG" change, the tests > schema_evol_text_vec_part_all_complex.q and > schema_evol_text_vecrow_part_all_complex.q fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17021) Support replication of concatenate operation.
[ https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080614#comment-16080614 ] Hive QA commented on HIVE-17021: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876435/HIVE-17021.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10836 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5932/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5932/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5932/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876435 - PreCommit-HIVE-Build > Support replication of concatenate operation. > - > > Key: HIVE-17021 > URL: https://issues.apache.org/jira/browse/HIVE-17021 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17021.01.patch > > > We need to handle cases like ALTER TABLE ... CONCATENATE that also change the > files on disk, and potentially treat them similar to INSERT OVERWRITE, as it > does something equivalent to a compaction. > Note that a ConditionalTask might also be fired at the end of inserts at the > end of a tez task (or other exec engine) if appropriate HiveConf settings are > set, to automatically do this operation - these also need to be taken care of > for replication. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types
[ https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080588#comment-16080588 ] Teddy Choi commented on HIVE-16730: --- This second patch fixes a bug that doesn't skip columns when a deserialize read supports read field. > Vectorization: Schema Evolution for Text Vectorization / Complex Types > -- > > Key: HIVE-16730 > URL: https://issues.apache.org/jira/browse/HIVE-16730 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch > > > With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes > PARTIAL2, FINAL, and COMPLETE for AVG" change, the tests > schema_evol_text_vec_part_all_complex.q and > schema_evol_text_vecrow_part_all_complex.q fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types
[ https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-16730: -- Attachment: HIVE-16730.2.patch > Vectorization: Schema Evolution for Text Vectorization / Complex Types > -- > > Key: HIVE-16730 > URL: https://issues.apache.org/jira/browse/HIVE-16730 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch > > > With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes > PARTIAL2, FINAL, and COMPLETE for AVG" change, the tests > schema_evol_text_vec_part_all_complex.q and > schema_evol_text_vecrow_part_all_complex.q fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17021) Support replication of concatenate operation.
[ https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080518#comment-16080518 ] ASF GitHub Bot commented on HIVE-17021: --- GitHub user sankarh opened a pull request: https://github.com/apache/hive/pull/202 HIVE-17021: Support replication of concatenate operation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sankarh/hive HIVE-17021 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/202.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #202 commit e68853deef64037260fb951da329011ef6e5a3d5 Author: Sankar HariappanDate: 2017-07-10T15:24:45Z HIVE-17021: Support replication of concatenate operation. > Support replication of concatenate operation. > - > > Key: HIVE-17021 > URL: https://issues.apache.org/jira/browse/HIVE-17021 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17021.01.patch > > > We need to handle cases like ALTER TABLE ... CONCATENATE that also change the > files on disk, and potentially treat them similar to INSERT OVERWRITE, as it > does something equivalent to a compaction. > Note that a ConditionalTask might also be fired at the end of inserts at the > end of a tez task (or other exec engine) if appropriate HiveConf settings are > set, to automatically do this operation - these also need to be taken care of > for replication. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17021) Support replication of concatenate operation.
[ https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17021: Status: Patch Available (was: Open) > Support replication of concatenate operation. > - > > Key: HIVE-17021 > URL: https://issues.apache.org/jira/browse/HIVE-17021 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17021.01.patch > > > We need to handle cases like ALTER TABLE ... CONCATENATE that also change the > files on disk, and potentially treat them similar to INSERT OVERWRITE, as it > does something equivalent to a compaction. > Note that a ConditionalTask might also be fired at the end of inserts at the > end of a tez task (or other exec engine) if appropriate HiveConf settings are > set, to automatically do this operation - these also need to be taken care of > for replication. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work stopped] (HIVE-17021) Support replication of concatenate operation.
[ https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-17021 stopped by Sankar Hariappan. --- > Support replication of concatenate operation. > - > > Key: HIVE-17021 > URL: https://issues.apache.org/jira/browse/HIVE-17021 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > > We need to handle cases like ALTER TABLE ... CONCATENATE that also change the > files on disk, and potentially treat them similar to INSERT OVERWRITE, as it > does something equivalent to a compaction. > Note that a ConditionalTask might also be fired at the end of inserts at the > end of a tez task (or other exec engine) if appropriate HiveConf settings are > set, to automatically do this operation - these also need to be taken care of > for replication. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17021) Support replication of concatenate operation.
[ https://issues.apache.org/jira/browse/HIVE-17021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17021: Attachment: HIVE-17021.01.patch Added 01.patch with test cases to verify concatenate operations. - Concatenate operation either from ALTER TABLE or ConditionalTask prepares the plan MergeOperator->MoveTask - MergeOperator, merge all the files from the oven input path and push the output merged file to the temporary staging directory. - MoveTask, moves the merged file from temporary directory to the final warehouse data location. This task uses loadTable and loadPartition methods to load data from temp path to the warehouse which is basically used by Insert Overwrite flow. - Hence, CM recycle and firing insert event done already in the existing code. - Just added test cases to verify it. Request [~anishek]/[~daijy]/[~sushanth]/[~thejas] to review! > Support replication of concatenate operation. > - > > Key: HIVE-17021 > URL: https://issues.apache.org/jira/browse/HIVE-17021 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17021.01.patch > > > We need to handle cases like ALTER TABLE ... CONCATENATE that also change the > files on disk, and potentially treat them similar to INSERT OVERWRITE, as it > does something equivalent to a compaction. > Note that a ConditionalTask might also be fired at the end of inserts at the > end of a tez task (or other exec engine) if appropriate HiveConf settings are > set, to automatically do this operation - these also need to be taken care of > for replication. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first
[ https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080486#comment-16080486 ] Hive QA commented on HIVE-17063: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876417/HIVE-17063.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10835 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_after_drop_partition] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5931/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5931/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5931/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876417 - PreCommit-HIVE-Build > insert overwrite partition onto a external table fail when drop partition > first > --- > > Key: HIVE-17063 > URL: https://issues.apache.org/jira/browse/HIVE-17063 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.2, 2.1.1, 2.2.0 >Reporter: Wang Haihua >Assignee: Wang Haihua > Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch > > > The default value of {{hive.exec.stagingdir}} which is a relative path, and > also drop partition on a external table will not clear the real data. As a > result, insert overwrite partition twice will happen to fail because of the > target data to be moved has > already existed. > This happened when we reproduce partition data onto a external table. > I see the target data will not be cleared only when {{immediately generated > data}} is child of {{the target data directory}}, so my proposal is trying > to clear target file already existed finally whe doing rename {{immediately > generated data}} into {{the target data directory}} > Operation reproduced: > {code} > create external table insert_after_drop_partition(key string, val string) > partitioned by (insertdate string); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > alter table insert_after_drop_partition drop partition > (insertdate='2008-01-01'); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > {code} > Stack trace: > {code} > 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] > exec.Task: Failed with exception java.io.IOException: rename for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename > for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498) > at
[jira] [Commented] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error
[ https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080472#comment-16080472 ] Krishna Vaidyanath commented on HIVE-16983: --- If we solve this, I propose we add the solution to the troubleshooting section. > getFileStatus on accessible s3a://[bucket-name]/folder: throws > com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon > S3; Status Code: 403; Error Code: 403 Forbidden; > - > > Key: HIVE-16983 > URL: https://issues.apache.org/jira/browse/HIVE-16983 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.1.1 > Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to > S3 using s3a:// protocol >Reporter: Alex Baretto >Assignee: Vlad Gudikov > Fix For: 2.1.1 > > Attachments: HIVE-16983-branch-2.1.patch > > > I've followed various published documentation on integrating Apache Hive > 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` > and > `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and > `hive/conf/hive-site.xml`. > I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` > to work properly (it returns s3 ls of that bucket). So I know my creds, > bucket access, and overall Hadoop setup is valid. > hdfs dfs -ls s3a://[bucket-name]/ > > drwxrwxrwx - hdfs hdfs 0 2017-06-27 22:43 > s3a://[bucket-name]/files > ...etc. > hdfs dfs -ls s3a://[bucket-name]/files > > drwxrwxrwx - hdfs hdfs 0 2017-06-27 22:43 > s3a://[bucket-name]/files/my-csv.csv > However, when I attempt to access the same s3 resources from hive, e.g. run > any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION > 's3a://[bucket-name]/files/'`, it fails. > for example: > >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, > >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED > >BY ',' LOCATION 's3a://[bucket-name]/files/'; > I keep getting this error: > >FAILED: Execution Error, return code 1 from > >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: > >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus > >on s3a://[bucket-name]/files: > >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: > >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: > >C9CF3F9C50EF08D1), S3 Extended Request ID: > >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=) > This makes no sense. I have access to the bucket as one can see in the hdfs > test. And I've added the proper creds to hive-site.xml. > Anyone have any idea what's missing from this equation? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16983) getFileStatus on accessible s3a://[bucket-name]/folder: throws com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error
[ https://issues.apache.org/jira/browse/HIVE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080465#comment-16080465 ] Krishna Vaidyanath commented on HIVE-16983: --- I've been working this case with Alex, and can tell you that the same permissions, credentials, etc. all work correctly and as expected with s3n:// and s3://. They do not work with s3a://, despite setting the properties in core-site.xml that in code and doc are expected. We went through the troubleshooting docs but they did not provide any insight or guidance to fix this problem. Vlad, you're on to something, we will test with joda 2.9.9, we have been using 2.8.1. > getFileStatus on accessible s3a://[bucket-name]/folder: throws > com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon > S3; Status Code: 403; Error Code: 403 Forbidden; > - > > Key: HIVE-16983 > URL: https://issues.apache.org/jira/browse/HIVE-16983 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.1.1 > Environment: Hive 2.1.1 on Ubuntu 14.04 AMI in AWS EC2, connecting to > S3 using s3a:// protocol >Reporter: Alex Baretto >Assignee: Vlad Gudikov > Fix For: 2.1.1 > > Attachments: HIVE-16983-branch-2.1.patch > > > I've followed various published documentation on integrating Apache Hive > 2.1.1 with AWS S3 using the `s3a://` scheme, configuring `fs.s3a.access.key` > and > `fs.s3a.secret.key` for `hadoop/etc/hadoop/core-site.xml` and > `hive/conf/hive-site.xml`. > I am at the point where I am able to get `hdfs dfs -ls s3a://[bucket-name]/` > to work properly (it returns s3 ls of that bucket). So I know my creds, > bucket access, and overall Hadoop setup is valid. > hdfs dfs -ls s3a://[bucket-name]/ > > drwxrwxrwx - hdfs hdfs 0 2017-06-27 22:43 > s3a://[bucket-name]/files > ...etc. > hdfs dfs -ls s3a://[bucket-name]/files > > drwxrwxrwx - hdfs hdfs 0 2017-06-27 22:43 > s3a://[bucket-name]/files/my-csv.csv > However, when I attempt to access the same s3 resources from hive, e.g. run > any `CREATE SCHEMA` or `CREATE EXTERNAL TABLE` statements using `LOCATION > 's3a://[bucket-name]/files/'`, it fails. > for example: > >CREATE EXTERNAL TABLE IF NOT EXISTS mydb.my_table ( my_table_id string, > >my_tstamp timestamp, my_sig bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED > >BY ',' LOCATION 's3a://[bucket-name]/files/'; > I keep getting this error: > >FAILED: Execution Error, return code 1 from > >org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: > >java.nio.file.AccessDeniedException s3a://[bucket-name]/files: getFileStatus > >on s3a://[bucket-name]/files: > >com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: > >Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: > >C9CF3F9C50EF08D1), S3 Extended Request ID: > >T2xZ87REKvhkvzf+hdPTOh7CA7paRpIp6IrMWnDqNFfDWerkZuAIgBpvxilv6USD0RSxM9ymM6I=) > This makes no sense. I have access to the bucket as one can see in the hdfs > test. And I've added the proper creds to hive-site.xml. > Anyone have any idea what's missing from this equation? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first
[ https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang Haihua updated HIVE-17063: --- Status: Patch Available (was: In Progress) > insert overwrite partition onto a external table fail when drop partition > first > --- > > Key: HIVE-17063 > URL: https://issues.apache.org/jira/browse/HIVE-17063 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.1, 1.2.2, 2.2.0 >Reporter: Wang Haihua >Assignee: Wang Haihua > Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch > > > The default value of {{hive.exec.stagingdir}} which is a relative path, and > also drop partition on a external table will not clear the real data. As a > result, insert overwrite partition twice will happen to fail because of the > target data to be moved has > already existed. > This happened when we reproduce partition data onto a external table. > I see the target data will not be cleared only when {{immediately generated > data}} is child of {{the target data directory}}, so my proposal is trying > to clear target file already existed finally whe doing rename {{immediately > generated data}} into {{the target data directory}} > Operation reproduced: > {code} > create external table insert_after_drop_partition(key string, val string) > partitioned by (insertdate string); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > alter table insert_after_drop_partition drop partition > (insertdate='2008-01-01'); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > {code} > Stack trace: > {code} > 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] > exec.Task: Failed with exception java.io.IOException: rename for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename > for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:) > at > org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at >
[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first
[ https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang Haihua updated HIVE-17063: --- Attachment: HIVE-17063.2.patch > insert overwrite partition onto a external table fail when drop partition > first > --- > > Key: HIVE-17063 > URL: https://issues.apache.org/jira/browse/HIVE-17063 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.2, 2.1.1, 2.2.0 >Reporter: Wang Haihua >Assignee: Wang Haihua > Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch > > > The default value of {{hive.exec.stagingdir}} which is a relative path, and > also drop partition on a external table will not clear the real data. As a > result, insert overwrite partition twice will happen to fail because of the > target data to be moved has > already existed. > This happened when we reproduce partition data onto a external table. > I see the target data will not be cleared only when {{immediately generated > data}} is child of {{the target data directory}}, so my proposal is trying > to clear target file already existed finally whe doing rename {{immediately > generated data}} into {{the target data directory}} > Operation reproduced: > {code} > create external table insert_after_drop_partition(key string, val string) > partitioned by (insertdate string); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > alter table insert_after_drop_partition drop partition > (insertdate='2008-01-01'); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > {code} > Stack trace: > {code} > 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] > exec.Task: Failed with exception java.io.IOException: rename for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename > for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:) > at > org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first
[ https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang Haihua updated HIVE-17063: --- Status: In Progress (was: Patch Available) > insert overwrite partition onto a external table fail when drop partition > first > --- > > Key: HIVE-17063 > URL: https://issues.apache.org/jira/browse/HIVE-17063 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.1, 1.2.2, 2.2.0 >Reporter: Wang Haihua >Assignee: Wang Haihua > Attachments: HIVE-17063.1.patch > > > The default value of {{hive.exec.stagingdir}} which is a relative path, and > also drop partition on a external table will not clear the real data. As a > result, insert overwrite partition twice will happen to fail because of the > target data to be moved has > already existed. > This happened when we reproduce partition data onto a external table. > I see the target data will not be cleared only when {{immediately generated > data}} is child of {{the target data directory}}, so my proposal is trying > to clear target file already existed finally whe doing rename {{immediately > generated data}} into {{the target data directory}} > Operation reproduced: > {code} > create external table insert_after_drop_partition(key string, val string) > partitioned by (insertdate string); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > alter table insert_after_drop_partition drop partition > (insertdate='2008-01-01'); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > {code} > Stack trace: > {code} > 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] > exec.Task: Failed with exception java.io.IOException: rename for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename > for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:) > at > org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) >
[jira] [Comment Edited] (HIVE-14487) Add REBUILD statement for materialized views
[ https://issues.apache.org/jira/browse/HIVE-14487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438566#comment-15438566 ] Jesus Camacho Rodriguez edited comment on HIVE-14487 at 7/10/17 12:33 PM: -- [~ekoifman], thanks for the feedback. That is fair point and something I had not considered yet; we do not do anything special in HIVE-14249, which would lead to inconsistent/incorrect results if a user uses the materialized view while it is being rebuilt. I guess raising an error should be enough. Then we would need to keep the state for the materialized view in the metastore? Or do you have any other idea? I can 1) create a follow-up for this, as HIVE-14249 has passed QA and is ready to go in, 2) I can add the new logic to HIVE-14249, or 3) I can remove the logic for REBUILD completely from HIVE-14249 and put it all together in a new patch. I am inclined to go with 3. What is your take? was (Author: jcamachorodriguez): [~ekoifman], thanks for the feedback. That is fair point and something I had not considered yet; we do not do anything special in HIVE-14487, which would lead to inconsistent/incorrect results if a user uses the materialized view while it is being rebuilt. I guess raising an error should be enough. Then we would need to keep the state for the materialized view in the metastore? Or do you have any other idea? I can 1) create a follow-up for this, as HIVE-14487 has passed QA and is ready to go in, 2) I can add the new logic to HIVE-14487, or 3) I can remove the logic for REBUILD completely from HIVE-14487 and put it all together in a new patch. I am inclined to go with 3. What is your take? > Add REBUILD statement for materialized views > > > Key: HIVE-14487 > URL: https://issues.apache.org/jira/browse/HIVE-14487 > Project: Hive > Issue Type: Sub-task > Components: Materialized views >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez > > Support for rebuilding existing materialized views. The statement is the > following: > {code:sql} > ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14487) Add REBUILD statement for materialized views
[ https://issues.apache.org/jira/browse/HIVE-14487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080263#comment-16080263 ] Jesus Camacho Rodriguez commented on HIVE-14487: [~asomani], since then I have not had the chance to work on this. We went with option 3 described above, thus the _rebuild_ option was not added in HIVE-14249. I hope to find the time to add the _REBUILD_ option for 3.0 release; in turn, contributions are welcome. Thanks > Add REBUILD statement for materialized views > > > Key: HIVE-14487 > URL: https://issues.apache.org/jira/browse/HIVE-14487 > Project: Hive > Issue Type: Sub-task > Components: Materialized views >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez > > Support for rebuilding existing materialized views. The statement is the > following: > {code:sql} > ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17035) Optimizer: Lineage transform() should be invoked after rest of the optimizers are invoked
[ https://issues.apache.org/jira/browse/HIVE-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080239#comment-16080239 ] Rajesh Balamohan commented on HIVE-17035: - RB: https://reviews.apache.org/r/60743/ > Optimizer: Lineage transform() should be invoked after rest of the optimizers > are invoked > - > > Key: HIVE-17035 > URL: https://issues.apache.org/jira/browse/HIVE-17035 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-17035.1.patch, HIVE-17035.2.patch, > HIVE-17035.3.patch, HIVE-17035.4.patch > > > In a fairly large query which had tens of left join, time taken to create > linageInfo itself took 1500+ seconds. This is due to the fact that the table > had lots of columns and in some processing, it ended up processing 7000+ > value columns in {{ReduceSinkLineage}}, though only 50 columns were projected > in the query. > It would be good to invoke lineage transform when rest of the optimizers in > {{Optimizer}} are invoked. This would avoid unwanted processing and help in > improving the runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17035) Optimizer: Lineage transform() should be invoked after rest of the optimizers are invoked
[ https://issues.apache.org/jira/browse/HIVE-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080221#comment-16080221 ] Hive QA commented on HIVE-17035: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876397/HIVE-17035.4.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 74 failed/errored test(s), 10834 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[join] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] (batchId=46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extract] (batchId=3) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynpart_sort_optimization2] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[reduce_deduplicate_extended] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_table_update] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acidvec_table_update] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_result_complex] (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_union_multiinsert] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[update_all_partitioned] (batchId=154) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[infer_bucket_sort_map_operators] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[infer_bucket_sort_reducers_power_two] (batchId=167) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join5] (batchId=169) org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_map_operators] (batchId=86) org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver[infer_bucket_sort_reducers_power_two] (batchId=86) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[mapreduce_stack_trace] (batchId=91) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[mapreduce_stack_trace_turnoff] (batchId=91) org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testCliDriver[minimr_broken_pipe] (batchId=91) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[add_part_multiple] (batchId=129) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join26] (batchId=106) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join8] (batchId=136) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_smb_mapjoin_14] (batchId=125) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucketmapjoin5] (batchId=136) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[date_udf] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[filter_join_breaktask2] (batchId=133) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby4_map] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby4_map_skew] (batchId=124) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_cube1] (batchId=101) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_multi_single_reducer] (batchId=126) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_rollup1] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[input_part2] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join27] (batchId=117) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join30] (batchId=133) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join32] (batchId=108) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join32_lessSize] (batchId=103) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join33] (batchId=107) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join38] (batchId=134) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join8] (batchId=120) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join_map_ppr] (batchId=132) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multi_insert_gby2] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multi_insert_with_join] (batchId=128) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[pcr] (batchId=125)
[jira] [Commented] (HIVE-16751) Support different types for grouping columns in GroupBy Druid queries
[ https://issues.apache.org/jira/browse/HIVE-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080217#comment-16080217 ] Jesus Camacho Rodriguez commented on HIVE-16751: [~leftylev], I think there is no need for extra documentation for this one, since this just makes the execution more efficient but it is transparent to final user. Thanks! > Support different types for grouping columns in GroupBy Druid queries > - > > Key: HIVE-16751 > URL: https://issues.apache.org/jira/browse/HIVE-16751 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 3.0.0 > > Attachments: HIVE-16751.patch > > > Calcite 1.13 pushes EXTRACT and FLOOR function to Druid as an extraction > function (cf CALCITE-1758). Originally, we were assuming that all group by > columns in a druid query were of STRING type; however, this will not true > anymore (result of EXTRACT is an INT and result of FLOOR a TIMESTAMP). > When we upgrade to Calcite 1.13, we will need to extend the DruidSerDe to > handle these functions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)