[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083426#comment-16083426 ] Hive QA commented on HIVE-16177: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876712/HIVE-16177.20-branch-2.patch {color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10584 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228) org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5971/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5971/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5971/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876712 - PreCommit-HIVE-Build > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, > HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
[ https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083411#comment-16083411 ] Chao Sun commented on HIVE-17018: - What I'm thinking is that the new config (say A) will be a value smaller than {{spark.executor.memory}} and will be divided among all tasks in the executor (so A / {{spark.executor.cores}}). Another way is to specify A as the maximum hashtable memory for a single Spark task. This is the limit of the sum of the sizes for all hash tables in a single work (MapWork or ReduceWork). I think no change is needed for the code related to {{connectedMapJoinSize}}. > Small table is converted to map join even the total size of small tables > exceeds the threshold(hive.auto.convert.join.noconditionaltask.size) > - > > Key: HIVE-17018 > URL: https://issues.apache.org/jira/browse/HIVE-17018 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt > > > we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it > means the sum of size for n-1 of the tables/partitions for a n-way join is > smaller than it, it will be converted to a map join. for example, A join B > join C join D join E. Big table is A(100M), small tables are > B(10M),C(10M),D(10M),E(10M). If we set > hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B > will be converted to map join but C will not be converted to map join. In my > understanding, because hive.auto.convert.join.noconditionaltask.size can only > contain E and D, so C and B should not be converted to map join. > Let's explain more why E can be converted to map join. > in current code, > [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364] > calculates all the mapjoins in the parent path and child path. The search > stops when encountering [UnionOperator or > ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381]. > Because C is not converted to map join because {{connectedMapJoinSize + > totalSize) > maxSize}} [see > code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The > RS before the join of C remains. When calculating whether B will be > converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering > [RS > |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409] > and causes {{connectedMapJoinSize + totalSize) < maxSize}} matches. > [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not as you > are more familiar with SparkJoinOptimizer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair
[ https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17069: -- Attachment: HIVE-17069.01.patch > Refactor OrcRawRecrodMerger.ReaderPair > -- > > Key: HIVE-17069 > URL: https://issues.apache.org/jira/browse/HIVE-17069 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17069.01.patch > > > this should be done post HIVE-16177 so as not to obscure the functional > changes completely > Make ReaderPair an interface > ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" > code path > OriginalReaderPair - same as now but w/o incomprehensible override/variable > shadowing logic. > Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common > base class. > Push discoverKeyBounds() into appropriate implementation -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16977) Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN
[ https://issues.apache.org/jira/browse/HIVE-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083403#comment-16083403 ] Teddy Choi commented on HIVE-16977: --- It looks like HIVE-16731 already solved this issue, too. It allows not only null values, but also other expressions. > Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN > -- > > Key: HIVE-16977 > URL: https://issues.apache.org/jira/browse/HIVE-16977 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > > VectorUDFAdaptor(CASE WHEN ((_col2 > 0)) THEN ((UDFToDouble(_col3) / > UDFToDouble(_col2)) BETWEEN 0. AND 1.5) ... > The expression in the THEN is not permitted. Only columns or constants are > vectorized. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083365#comment-16083365 ] Rui Li commented on HIVE-16922: --- I'm +1 to fix the typo, though it needs to be marked as "breaking" change. [~libing], have you checked the latest test failures? Is any of them related to your patch? > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083377#comment-16083377 ] Hive QA commented on HIVE-17066: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876732/HIVE-17066.4.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10825 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=101) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5970/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5970/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5970/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876732 - PreCommit-HIVE-Build > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie
[ https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083374#comment-16083374 ] Yibing Shi commented on HIVE-15767: --- [~peterceluch], can the tokens in Oozie launcher application still be passed to Spark job when property {{mapreduce.job.credentials.binary}} is unset? For example, in an environment where HDFS transparent encryption is enabled, is Spark job still able to connect to KMS servers? (The change is in {{RemoteHiveSparkClient}}. Hive on MR shouldn't be affected. Oozie actions have already make sure the tokens are added to action configuration, which then should be passed to MR jobs). > Hive On Spark is not working on secure clusters from Oozie > -- > > Key: HIVE-15767 > URL: https://issues.apache.org/jira/browse/HIVE-15767 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1, 2.1.1 >Reporter: Peter Cseh >Assignee: Peter Cseh > Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch > > > When a HiveAction is launched form Oozie with Hive On Spark enabled, we're > getting errors: > {noformat} > Caused by: java.io.IOException: Exception reading > file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155) > {noformat} > This is caused by passing the {{mapreduce.job.credentials.binary}} property > to the Spark configuration in RemoteHiveSparkClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16977) Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN
[ https://issues.apache.org/jira/browse/HIVE-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-16977: - Assignee: Teddy Choi > Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN > -- > > Key: HIVE-16977 > URL: https://issues.apache.org/jira/browse/HIVE-16977 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > > VectorUDFAdaptor(CASE WHEN ((_col2 > 0)) THEN ((UDFToDouble(_col3) / > UDFToDouble(_col2)) BETWEEN 0. AND 1.5) ... > The expression in the THEN is not permitted. Only columns or constants are > vectorized. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first
[ https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083357#comment-16083357 ] Wang Haihua commented on HIVE-17063: Fix some test error, now seems failed test result is not related with this patch. > insert overwrite partition onto a external table fail when drop partition > first > --- > > Key: HIVE-17063 > URL: https://issues.apache.org/jira/browse/HIVE-17063 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.2, 2.1.1, 2.2.0 >Reporter: Wang Haihua >Assignee: Wang Haihua > Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, > HIVE-17063.3.patch > > > The default value of {{hive.exec.stagingdir}} which is a relative path, and > also drop partition on a external table will not clear the real data. As a > result, insert overwrite partition twice will happen to fail because of the > target data to be moved has > already existed. > This happened when we reproduce partition data onto a external table. > I see the target data will not be cleared only when {{immediately generated > data}} is child of {{the target data directory}}, so my proposal is trying > to clear target file already existed finally whe doing rename {{immediately > generated data}} into {{the target data directory}} > Operation reproduced: > {code} > create external table insert_after_drop_partition(key string, val string) > partitioned by (insertdate string); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > alter table insert_after_drop_partition drop partition > (insertdate='2008-01-01'); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > {code} > Stack trace: > {code} > 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] > exec.Task: Failed with exception java.io.IOException: rename for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename > for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:) > at > org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at
[jira] [Updated] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used
[ https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-16975: -- Attachment: HIVE-16975.1.patch > Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is > now used > - > > Key: HIVE-16975 > URL: https://issues.apache.org/jira/browse/HIVE-16975 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16975.1.patch > > > Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used
[ https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi reassigned HIVE-16975: - Assignee: Teddy Choi > Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is > now used > - > > Key: HIVE-16975 > URL: https://issues.apache.org/jira/browse/HIVE-16975 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16975.1.patch > > > Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083356#comment-16083356 ] Rui Li commented on HIVE-16907: --- I think we should firstly decide whether we allow "dot" in table names. I prefer disallowing it, because such names are confusing and seems we have already disallowed such column names via HIVE-10120. Then {{`tdb.t1`}} is considered as backtick quoted full qualified table name (useful when db/table name contains reserved keywords), and it should be treated as {{tdb.t1}} internally. Something like {{`tdb.t1.t2`}} should be disallowed. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator
[jira] [Updated] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used
[ https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-16975: -- Attachment: (was: HIVE-16975.1.patch) > Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is > now used > - > > Key: HIVE-16975 > URL: https://issues.apache.org/jira/browse/HIVE-16975 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > > Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used
[ https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-16975: -- Attachment: HIVE-16975.1.patch > Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is > now used > - > > Key: HIVE-16975 > URL: https://issues.apache.org/jira/browse/HIVE-16975 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16975.1.patch > > > Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used
[ https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-16975: -- Status: Patch Available (was: Open) > Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is > now used > - > > Key: HIVE-16975 > URL: https://issues.apache.org/jira/browse/HIVE-16975 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16975.1.patch > > > Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1608#comment-1608 ] Hive QA commented on HIVE-16832: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876708/HIVE-16832.21.patch {color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10853 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=60) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.ql.TestTxnCommands.testNonAcidToAcidConversion01 (batchId=282) org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion02 (batchId=269) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testNonAcidToAcidConversion02 (batchId=280) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion02 (batchId=277) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5969/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5969/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5969/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876708 - PreCommit-HIVE-Build > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17065) You can not successfully deploy hive clusters with Hive guidance documents
[ https://issues.apache.org/jira/browse/HIVE-17065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083322#comment-16083322 ] ZhangBing Lin commented on HIVE-17065: -- Hi,[~leftylev],thank you for your suggest! > You can not successfully deploy hive clusters with Hive guidance documents > -- > > Key: HIVE-17065 > URL: https://issues.apache.org/jira/browse/HIVE-17065 > Project: Hive > Issue Type: Improvement > Components: Documentation >Reporter: ZhangBing Lin >Priority: Minor > Attachments: screenshot-1.png > > > When I follow the official document from cwiki > [https://cwiki.apache.org/confluence/display/Hive/GettingStarted] to build > Hive2.1.1 single node service encountered several problems:: > 1, the following to create the HIVE warehouse directory needs to be modified > A $ HADOOP_HOME / bin / hadoop fs -mkdir /user/hive/warehouse > B $ HADOOP_HOME / bin / hadoop fs -mkdir -p /user/hive/warehouse > Using B instead of A might be better > 2, the following two description positions need to be adjusted > A.Running Hive CLI > To use the Hive command line interface (CLI) from the shell: > $ $HIVE_HOME/bin/hive > B.Running HiveServer2 and Beeline > Starting from Hive 2.1, we need to run the schematool command below as an > initialization step. For example, we can use "derby" as db type. > $ $HIVE_HOME/bin/schematool -dbType -initSchema > When I execute the $HIVE_HOME/bin/hive command, the following error occurs: > !screenshot-1.png! > When I execute the following order, and then the implementation of hive order > problem solving: > $ HIVE_HOME/bin/schematool -dbType derby -initSchema -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16370) Avro data type null not supported on partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083316#comment-16083316 ] Andrew Sears commented on HIVE-16370: - This is something that can be handled in Avro by unioning the null type with another type in the avro file. [http://apache-avro.679487.n3.nabble.com/Support-for-null-in-String-primitive-types-td4025659.html] ObjectInspectorUtils.java might be updated to handle "void" primitive category as it does in other cases. > Avro data type null not supported on partitioned tables > --- > > Key: HIVE-16370 > URL: https://issues.apache.org/jira/browse/HIVE-16370 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0, 2.1.1 >Reporter: rui miranda >Priority: Minor > > I was attempting to create hive tables over some partitioned Avro files. It > seems the void data type (Avro null) is not supported on partitioned tables > (i could not replicate the bug on an un-partitioned table). > --- > i managed to replicate the bug on two different hive versions. > Hive 1.1.0-cdh5.10.0 > Hive 2.1.1-amzn-0 > > how to replicate (avro tools are required to create the avro files): > $ wget > http://mirror.serversupportforum.de/apache/avro/avro-1.8.1/java/avro-tools-1.8.1.jar > $ mkdir /tmp/avro > $ mkdir /tmp/avro/null > $ echo "{ \ > \"type\" : \"record\", \ > \"name\" : \"null_failure\", \ > \"namespace\" : \"org.apache.avro.null_failure\", \ > \"doc\":\"the purpose of this schema is to replicate the hive avro null > failure\", \ > \"fields\" : [{\"name\":\"one\", \"type\":\"null\",\"default\":null}] \ > } " > /tmp/avro/null/schema.avsc > $ echo "{\"one\":null}" > /tmp/avro/null/data.json > $ java -jar avro-tools-1.8.1.jar fromjson --schema-file > /tmp/avro/null/schema.avsc /tmp/avro/null/data.json > /tmp/avro/null/data.avro > $ hdfs dfs -mkdir /tmp/avro > $ hdfs dfs -mkdir /tmp/avro/null > $ hdfs dfs -mkdir /tmp/avro/null/schema > $ hdfs dfs -mkdir /tmp/avro/null/data > $ hdfs dfs -mkdir /tmp/avro/null/data/foo=bar > $ hdfs dfs -copyFromLocal /tmp/avro/null/schema.avsc > /tmp/avro/null/schema/schema.avsc > $ hdfs dfs -copyFromLocal /tmp/avro/null/data.avro > /tmp/avro/null/data/foo=bar/data.avro > $ hive > hive> CREATE EXTERNAL TABLE avro_null > PARTITIONED BY (foo string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED as INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION > '/tmp/avro/null/data/' > TBLPROPERTIES ( > 'avro.schema.url'='/tmp/avro/null/schema/schema.avsc') > ; > OK > Time taken: 3.127 seconds > hive> msck repair table avro_null; > OK > Partitions not in metastore: avro_null:foo=bar > Repair: Added partition to metastore avro_null:foo=bar > Time taken: 0.712 seconds, Fetched: 2 row(s) > hive> select * from avro_null; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > hive> select foo, count(1) from avro_null group by foo; > OK > bar 1 > Time taken: 29.806 seconds, Fetched: 1 row(s) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15705) Event replication for constraints
[ https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083293#comment-16083293 ] Hive QA commented on HIVE-15705: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876706/HIVE-15705.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5968/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5968/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5968/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876706 - PreCommit-HIVE-Build > Event replication for constraints > - > > Key: HIVE-15705 > URL: https://issues.apache.org/jira/browse/HIVE-15705 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, > HIVE-15705.3.patch, HIVE-15705.4.patch > > > Make event replication for primary key and foreign key work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Open (was: Patch Available) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Patch Available (was: Open) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16955) General Improvements To org.apache.hadoop.hive.metastore.MetaStoreUtils
[ https://issues.apache.org/jira/browse/HIVE-16955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-16955: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks, Beluga! > General Improvements To org.apache.hadoop.hive.metastore.MetaStoreUtils > --- > > Key: HIVE-16955 > URL: https://issues.apache.org/jira/browse/HIVE-16955 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-16955.1.patch, HIVE-16955.2.patch, > HIVE-16955.3.patch > > > # Increase Code Reuse with {{Apache Commons}} > # Improve debug logging (lowered to TRACE where appropriate) > # Add optimizations for empty {{Collection}} scenarios > # Better size {{ArrayList}} at instantiation > # Use {{StringBuilder}} instead of String concatenation > # Increase consistency of code style among similar methods > # Decrease file line count -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Attachment: HIVE-17066.4.patch > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17076) typo in itests/src/test/resources/testconfiguration.properties
[ https://issues.apache.org/jira/browse/HIVE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17076: -- Status: Patch Available (was: Open) > typo in itests/src/test/resources/testconfiguration.properties > -- > > Key: HIVE-17076 > URL: https://issues.apache.org/jira/browse/HIVE-17076 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17076.01.patch > > > it has > {noformat} > minillap.shared.query.files=insert_into1.q,\ > insert_into2.q,\ > insert_values_orig_table.,\ > llapdecider.q,\ > {noformat} > "insert_values_orig_table.,\" is a typo which causes these to be run with > TestCliDriver > Note that there are 2 .q files that start with insert_values_orig_table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083251#comment-16083251 ] Hive QA commented on HIVE-17066: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876709/HIVE-17066.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10839 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_semijoin_user_level] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer1] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_tests] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_joins_explain] (batchId=151) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5967/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5967/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5967/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876709 - PreCommit-HIVE-Build > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Patch Available (was: Open) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083184#comment-16083184 ] Jason Dere commented on HIVE-16926: --- Maybe I can just replace pendingClients/registeredClients with a single list and the RequestInfo can keep a state to show if the request is pending/running/etc. Correct, the shared umbilical server will not be shut down. Is there any action needed on this part? I don't think anything is exposed to shut it down. > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, > HIVE-16926.3.patch, HIVE-16926.4.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16955) General Improvements To org.apache.hadoop.hive.metastore.MetaStoreUtils
[ https://issues.apache.org/jira/browse/HIVE-16955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083175#comment-16083175 ] Ashutosh Chauhan commented on HIVE-16955: - +1 > General Improvements To org.apache.hadoop.hive.metastore.MetaStoreUtils > --- > > Key: HIVE-16955 > URL: https://issues.apache.org/jira/browse/HIVE-16955 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HIVE-16955.1.patch, HIVE-16955.2.patch, > HIVE-16955.3.patch > > > # Increase Code Reuse with {{Apache Commons}} > # Improve debug logging (lowered to TRACE where appropriate) > # Add optimizations for empty {{Collection}} scenarios > # Better size {{ArrayList}} at instantiation > # Use {{StringBuilder}} instead of String concatenation > # Increase consistency of code style among similar methods > # Decrease file line count -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083177#comment-16083177 ] Hive QA commented on HIVE-17073: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876697/HIVE-17073.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_1] (batchId=76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_tablesample_rows] (batchId=49) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_1] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_2] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_window] (batchId=159) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator (batchId=272) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5966/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5966/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5966/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876697 - PreCommit-HIVE-Build > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Patch Available (was: Open) Patch 3 addresses review comments > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083151#comment-16083151 ] Ashutosh Chauhan commented on HIVE-17066: - +1 pending tests. > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17076) typo in itests/src/test/resources/testconfiguration.properties
[ https://issues.apache.org/jira/browse/HIVE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17076: -- Attachment: HIVE-17076.01.patch > typo in itests/src/test/resources/testconfiguration.properties > -- > > Key: HIVE-17076 > URL: https://issues.apache.org/jira/browse/HIVE-17076 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17076.01.patch > > > it has > {noformat} > minillap.shared.query.files=insert_into1.q,\ > insert_into2.q,\ > insert_values_orig_table.,\ > llapdecider.q,\ > {noformat} > "insert_values_orig_table.,\" is a typo which causes these to be run with > TestCliDriver > Note that there are 2 .q files that start with insert_values_orig_table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16979) Cache UGI for metastore
[ https://issues.apache.org/jira/browse/HIVE-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li updated HIVE-16979: -- Attachment: HIVE-16979.3.patch > Cache UGI for metastore > --- > > Key: HIVE-16979 > URL: https://issues.apache.org/jira/browse/HIVE-16979 > Project: Hive > Issue Type: Improvement >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-16979.1.patch, HIVE-16979.2.patch, > HIVE-16979.3.patch > > > FileSystem.closeAllForUGI is called per request against metastore to dispose > UGI, which involves talking to HDFS name node and is time consuming. So the > perf improvement would be caching and reusing the UGI. > Per FileSystem.closeAllForUG call could take up to 20 ms as E2E latency > against HDFS. Usually a Hive query could result in several calls against > metastore, so we can save up to 50-100 ms per hive query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16177: -- Attachment: HIVE-16177.20-branch-2.patch > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, > HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Attachment: HIVE-17066.3.patch > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17076) typo in itests/src/test/resources/testconfiguration.properties
[ https://issues.apache.org/jira/browse/HIVE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-17076: - Assignee: Eugene Koifman > typo in itests/src/test/resources/testconfiguration.properties > -- > > Key: HIVE-17076 > URL: https://issues.apache.org/jira/browse/HIVE-17076 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > it has > {noformat} > minillap.shared.query.files=insert_into1.q,\ > insert_into2.q,\ > insert_values_orig_table.,\ > llapdecider.q,\ > {noformat} > "insert_values_orig_table.,\" is a typo which causes these to be run with > TestCliDriver > Note that there are 2 .q files that start with insert_values_orig_table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Patch Available (was: Open) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15705) Event replication for constraints
[ https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-15705: -- Attachment: HIVE-15705.4.patch Resync with master. > Event replication for constraints > - > > Key: HIVE-15705 > URL: https://issues.apache.org/jira/browse/HIVE-15705 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, > HIVE-15705.3.patch, HIVE-15705.4.patch > > > Make event replication for primary key and foreign key work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Attachment: HIVE-17066.3.patch > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Attachment: (was: HIVE-17066.3.patch) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.
[ https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083111#comment-16083111 ] Siddharth Seth commented on HIVE-17019: --- Thanks for posting the patch. Will be useful to get relevant data for a query. - Change the top level package from llap-debug to tez-debug? (Works with both I believe) [~ashutoshc], [~thejas] - any recommendations on whether the code gets a top level module, or goes under an existing module. This allows downloading of various debug artifacts for a tez job - logs, metrics for llap, hiveserver2 logs (soon), tez am logs, ATS data for the query (hive and tez). - In the new pom.xml, dependency on hive-llap-server. 1) Is it required?, 2) Will need to exclude some dependent artifacts. See service/pom.xml llap-server dependency handling - LogDownloadServlet - Should this throw an error as soon as the filename pattern validation fails? - LogDownloadServlet - change to dagId/queryId validation instead - LogDownloadServlet - thread being created inside of the request handler? This should be limited outside of the request? so that only a controlled number of parallel artifact downloads can run. - LogDownloadServlet - what happens in case of aggregator failure? Exception back to the user? - LogDownloadServlet - seems to be generating the file to disk and then streaming it over. Can this be streamed over directly instead. Otherwise there's the possibility of leaking files. (Artifact.downloadIntoStream or some such?) Guessing this is complicated further by the multi-threaded artifact downloader. Alternately need to have a cleanup mechanism. - Timeout on the tests - Apache header needs to be added to files where it is missing. - Main - Please rename to something more indicative of what the tool does. - Main - Likely a follow up jira - parse using a standard library, instead of trying to parse the arguments to main directly. - Server - Enabling the artifact should be controlled via a config. Does not always need to be hosted in HS2 (Default disabled, at least till security can be sorted out) - Is it possible to support a timeout on the downloads? (Can be a follow up jira) - ArtifactAggregator - I believe this does 2 stages of dependent artifacts / downloads? Stage1 - download whatever it can. Information from this should should be adequate for stage2 downloads ? - For the ones not implemented yet (DummyArtifact) - think it's better to just comment out the code, instead of invoking the DummyArtifacts downloader - Security - ACL enforcement required on secure clusters to make sure users can only download what they have access to. This is a must fix before this can be enabled by default. - Security - this can work around yarn restrictions on log downloads, since the files are being accessed by the hive user. Could you please add some details on cluster testing. > Add support to download debugging information as an archive. > > > Key: HIVE-17019 > URL: https://issues.apache.org/jira/browse/HIVE-17019 > Project: Hive > Issue Type: Bug >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash > Attachments: HIVE-17019.01.patch > > > Given a queryId or dagId, get all information related to it: like, tez am, > task logs, hive ats data, tez ats data, slider am status, etc. Package it > into and archive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Attachment: HIVE-17066.3.patch > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Open (was: Patch Available) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16832: -- Attachment: HIVE-16832.21.patch patch 21 addresses Gopal's comments > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Attachment: (was: HIVE-17066.3.patch) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15051) Test framework integration with findbugs, rat checks etc.
[ https://issues.apache.org/jira/browse/HIVE-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083050#comment-16083050 ] Lefty Leverenz commented on HIVE-15051: --- Thanks for the explanations, [~pvary]. I used your own wording and tinkered a bit with paragraph 3 -- feel free to re-edit whether for meaning or for style preferences. * [Running Yetus | https://cwiki.apache.org/confluence/display/Hive/Running+Yetus] > Test framework integration with findbugs, rat checks etc. > - > > Key: HIVE-15051 > URL: https://issues.apache.org/jira/browse/HIVE-15051 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Peter Vary >Assignee: Peter Vary > Fix For: 3.0.0 > > Attachments: beeline.out, HIVE-15051.02.patch, HIVE-15051.patch, > Interim.patch, ql.out > > > Find a way to integrate code analysis tools like findbugs, rat checks to > PreCommit tests, thus removing the burden from reviewers to check the code > style and other checks which could be done by code. > Might worth to take a look on Yetus, but keep in mind the Hive has a specific > parallel test framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083089#comment-16083089 ] Hive QA commented on HIVE-17073: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876697/HIVE-17073.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_1] (batchId=76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_tablesample_rows] (batchId=49) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_1] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_2] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_window] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator (batchId=272) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5965/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5965/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5965/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876697 - PreCommit-HIVE-Build > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15144) JSON.org license is now CatX
[ https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15144: --- Fix Version/s: 3.0.0 2.3.0 > JSON.org license is now CatX > > > Key: HIVE-15144 > URL: https://issues.apache.org/jira/browse/HIVE-15144 > Project: Hive > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 2.2.0, 2.3.0, 3.0.0 > > Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch, > HIVE-15144.patch > > > per [update resolved legal|http://www.apache.org/legal/resolved.html#json]: > {quote} > CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE? > No. As of 2016-11-03 this has been moved to the 'Category X' license list. > Prior to this, use of the JSON Java library was allowed. See Debian's page > for a list of alternatives. > {quote} > I'm not sure when this dependency was first introduced, but it looks like > it's currently used in a few places: > https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15144) JSON.org license is now CatX
[ https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-15144: --- Resolution: Fixed Status: Resolved (was: Patch Available) > JSON.org license is now CatX > > > Key: HIVE-15144 > URL: https://issues.apache.org/jira/browse/HIVE-15144 > Project: Hive > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 2.2.0, 2.3.0, 3.0.0 > > Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch, > HIVE-15144.patch > > > per [update resolved legal|http://www.apache.org/legal/resolved.html#json]: > {quote} > CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE? > No. As of 2016-11-03 this has been moved to the 'Category X' license list. > Prior to this, use of the JSON Java library was allowed. See Debian's page > for a list of alternatives. > {quote} > I'm not sure when this dependency was first introduced, but it looks like > it's currently used in a few places: > https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Open (was: Patch Available) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083077#comment-16083077 ] Gopal V commented on HIVE-17073: [~jcamachorodriguez]: the boolean can be replaced with the changes from HIVE-16821 > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16751) Support different types for grouping columns in GroupBy Druid queries
[ https://issues.apache.org/jira/browse/HIVE-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083022#comment-16083022 ] Lefty Leverenz commented on HIVE-16751: --- Okay, thanks Jesús. > Support different types for grouping columns in GroupBy Druid queries > - > > Key: HIVE-16751 > URL: https://issues.apache.org/jira/browse/HIVE-16751 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 3.0.0 > > Attachments: HIVE-16751.patch > > > Calcite 1.13 pushes EXTRACT and FLOOR function to Druid as an extraction > function (cf CALCITE-1758). Originally, we were assuming that all group by > columns in a druid query were of STRING type; however, this will not true > anymore (result of EXTRACT is an INT and result of FLOOR a TIMESTAMP). > When we upgrade to Calcite 1.13, we will need to extend the DruidSerDe to > handle these functions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Status: Open (was: Patch Available) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
[ https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083058#comment-16083058 ] liyunzhang_intel commented on HIVE-17018: - [~csun]: {quote}A better way might be to have a separate config just for HoS, and maybe a limit on small table memory per executor.{quote} what I confused is how to do this? the original code is to calculate whether the total mapjoin size in the same stage exceed the threshold or not. Now create a new configure about calculating the threshold of all small tables according to the spark.executor.memory? If the total size of small tables in the same stage bigger than the spark.executor.memory, then not allow these small tables into the same stage but {{hive.auto.convert.join.nonconditionaltask.size}} is for caculating the total size of mapjoin size of small tables in the query? > Small table is converted to map join even the total size of small tables > exceeds the threshold(hive.auto.convert.join.noconditionaltask.size) > - > > Key: HIVE-17018 > URL: https://issues.apache.org/jira/browse/HIVE-17018 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt > > > we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it > means the sum of size for n-1 of the tables/partitions for a n-way join is > smaller than it, it will be converted to a map join. for example, A join B > join C join D join E. Big table is A(100M), small tables are > B(10M),C(10M),D(10M),E(10M). If we set > hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B > will be converted to map join but C will not be converted to map join. In my > understanding, because hive.auto.convert.join.noconditionaltask.size can only > contain E and D, so C and B should not be converted to map join. > Let's explain more why E can be converted to map join. > in current code, > [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364] > calculates all the mapjoins in the parent path and child path. The search > stops when encountering [UnionOperator or > ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381]. > Because C is not converted to map join because {{connectedMapJoinSize + > totalSize) > maxSize}} [see > code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The > RS before the join of C remains. When calculating whether B will be > converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering > [RS > |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409] > and causes {{connectedMapJoinSize + totalSize) < maxSize}} matches. > [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not as you > are more familiar with SparkJoinOptimizer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17065) You can not successfully deploy hive clusters with Hive guidance documents
[ https://issues.apache.org/jira/browse/HIVE-17065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083012#comment-16083012 ] Lefty Leverenz commented on HIVE-17065: --- [~linzhangbing], it's easy to get edit privileges for the Hive wiki: * [About This Wiki -- How to get permission to edit | https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit] * [About This Wiki -- How to edit the Hive wiki | https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-HowtoedittheHivewiki] But the modifications required are beyond my expertise. > You can not successfully deploy hive clusters with Hive guidance documents > -- > > Key: HIVE-17065 > URL: https://issues.apache.org/jira/browse/HIVE-17065 > Project: Hive > Issue Type: Improvement > Components: Documentation >Reporter: ZhangBing Lin >Priority: Minor > Attachments: screenshot-1.png > > > When I follow the official document from cwiki > [https://cwiki.apache.org/confluence/display/Hive/GettingStarted] to build > Hive2.1.1 single node service encountered several problems:: > 1, the following to create the HIVE warehouse directory needs to be modified > A $ HADOOP_HOME / bin / hadoop fs -mkdir /user/hive/warehouse > B $ HADOOP_HOME / bin / hadoop fs -mkdir -p /user/hive/warehouse > Using B instead of A might be better > 2, the following two description positions need to be adjusted > A.Running Hive CLI > To use the Hive command line interface (CLI) from the shell: > $ $HIVE_HOME/bin/hive > B.Running HiveServer2 and Beeline > Starting from Hive 2.1, we need to run the schematool command below as an > initialization step. For example, we can use "derby" as db type. > $ $HIVE_HOME/bin/schematool -dbType -initSchema > When I execute the $HIVE_HOME/bin/hive command, the following error occurs: > !screenshot-1.png! > When I execute the following order, and then the implementation of hive order > problem solving: > $ HIVE_HOME/bin/schematool -dbType derby -initSchema -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083011#comment-16083011 ] Hive QA commented on HIVE-17073: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876680/HIVE-17073.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator (batchId=272) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5964/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5964/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5964/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876680 - PreCommit-HIVE-Build > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082920#comment-16082920 ] Matt McCline commented on HIVE-17073: - Ok, LGTM +1 tests pending. > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082995#comment-16082995 ] Siddharth Seth commented on HIVE-16926: --- Functionally, looks good to me. Minor comments. - umbilicalServer.umbilicalProtocol.pendingClients.putIfAbsent -> Would be a little cleaner to add a method for this, similar to unregisterClient. - {code} + for (String key : umbilicalImpl.pendingClients.keySet()) { +LlapTaskUmbilicalExternalClient client = umbilicalImpl.pendingClients.get(key);{code} Replace with an iterator over the entrySet to avoid the get() ? Also, this pattern is repeated in hearbeat and nodeHeartbeat - could likely be a method. If I'm not mistaken, the shared umbilical server will not be shut down ever? Maybe in a follow up - some of the static classes could be split out. > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, > HIVE-16926.3.patch, HIVE-16926.4.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15144) JSON.org license is now CatX
[ https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082980#comment-16082980 ] Lefty Leverenz commented on HIVE-15144: --- [~pxiong], please update the fix versions to 2.3.0 and 3.0.0. Thanks. > JSON.org license is now CatX > > > Key: HIVE-15144 > URL: https://issues.apache.org/jira/browse/HIVE-15144 > Project: Hive > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Owen O'Malley >Priority: Blocker > Fix For: 2.2.0 > > Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch, > HIVE-15144.patch > > > per [update resolved legal|http://www.apache.org/legal/resolved.html#json]: > {quote} > CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE? > No. As of 2016-11-03 this has been moved to the 'Category X' license list. > Prior to this, use of the JSON Java library was allowed. See Debian's page > for a list of alternatives. > {quote} > I'm not sure when this dependency was first introduced, but it looks like > it's currently used in a few places: > https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie
[ https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082948#comment-16082948 ] Peter Cseh commented on HIVE-15767: --- This happens with HiveCLI, not with HS2. The exception is coming from the spark driver. When the HiveCLI is executed from shell, the mapreduce.job.credentials.binary is empty in the configuration as spark-submit is called from the RemoteClient. When it's executed from Oozie's LauncherMapper, Hive picks up this property from the Oozie launcher's configuration which is correct, but passes it to Spark. Spark runs in yarn-cluster mode so the Spark driver gets it's own container (which may be on an other machine). It look for the credential files in the folder where the Oozie Launcher ran. That's on a different machine, so it can't pick up the conatiner_tokens file which leaves the spark driver with no tokens so it fails. I don't know how Hive-on-MR works in this regards, but we had no similar issues with the HiveAction before, so I assume it works differently. I don't think it's possible to reproduce it using MiniClusters as the local folders will be available in the test so the Spark driver will be able to access it. > Hive On Spark is not working on secure clusters from Oozie > -- > > Key: HIVE-15767 > URL: https://issues.apache.org/jira/browse/HIVE-15767 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1, 2.1.1 >Reporter: Peter Cseh >Assignee: Peter Cseh > Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch > > > When a HiveAction is launched form Oozie with Hive On Spark enabled, we're > getting errors: > {noformat} > Caused by: java.io.IOException: Exception reading > file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155) > {noformat} > This is caused by passing the {{mapreduce.job.credentials.binary}} property > to the Spark configuration in RemoteHiveSparkClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082885#comment-16082885 ] Gopal V commented on HIVE-16832: bq. Suppose you populate a partition via 100 inserts and 1M rows. So you have 100 OTIDs. Yeah, this was an optimization for the possibility that you're doing an "update every row" merge which would otherwise cause a massive memory jump in deletes (& overflow the 2G limit on arrays). bq. Perhaps simply relying on the "push down" to delete deltas is enough and we are better off just keeping 3 arrays Yes, it might be better - I've yet to really look into the delete distribution for a regular CDC workload. The push-down into deletes is a big win anyway. Not too worried about the extra size here. > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082852#comment-16082852 ] Matt McCline edited comment on HIVE-17073 at 7/11/17 7:49 PM: -- Great work! I think you need to add some code to TableScanOperator -- it handles VectorizedRowBatch as pass-through, too. It has a forward call in it. Probably add an instanceof check at beginning of method and use it. And, LLAP drives in VRBs, too. Not sure where at the moment. Might just be via InputFileFormat. was (Author: mmccline): Great work! I think you need to add some code to TableScanOperator -- it handles VectorizedRowBatch as pass-through, too. It has a forward call in it. Probably add an instanceof check at beginning of method and use it. And, LLAP drives in VRBs, too. Not sure where at the moment. > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17073: --- Attachment: HIVE-17073.02.patch > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082904#comment-16082904 ] Hive QA commented on HIVE-8838: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876668/HIVE-8838.3.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10873 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5963/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5963/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5963/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876668 - PreCommit-HIVE-Build > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16177: -- Attachment: HIVE-16177.19-branch-2.patch > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, > HIVE-16177.19-branch-2.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082852#comment-16082852 ] Matt McCline commented on HIVE-17073: - Great work! I think you need to add some code to TableScanOperator -- it handles VectorizedRowBatch as pass-through, too. It has a forward call in it. Probably add an instanceof check at beginning of method and use it. And, LLAP drives in VRBs, too. Not sure where at the moment. > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082858#comment-16082858 ] Aihua Xu commented on HIVE-8838: Thanks [~szita] The patch looks good to me. +1. > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082847#comment-16082847 ] Adam Szita commented on HIVE-8838: -- [~aihuaxu] those are unrelated, they are flaky/failing without this patch too, see e.g this build from yesterday: https://builds.apache.org/job/PreCommit-HIVE-Build/5937/testReport/ > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082814#comment-16082814 ] Hive QA commented on HIVE-16177: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876662/HIVE-16177.18-branch-2.patch {color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10584 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=59) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hadoop.hive.ql.TestTxnCommands.testNonAcidToAcidConversion01 (batchId=278) org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion02 (batchId=266) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testNonAcidToAcidConversion02 (batchId=276) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion02 (batchId=273) org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228) org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5962/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5962/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5962/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876662 - PreCommit-HIVE-Build > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is
[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17073: --- Attachment: (was: HIVE-17073.01.patch) > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17073: --- Attachment: HIVE-17073.01.patch > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17073: --- Attachment: HIVE-17073.01.patch > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first
[ https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082722#comment-16082722 ] Hive QA commented on HIVE-17063: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876658/HIVE-17063.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5961/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5961/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5961/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876658 - PreCommit-HIVE-Build > insert overwrite partition onto a external table fail when drop partition > first > --- > > Key: HIVE-17063 > URL: https://issues.apache.org/jira/browse/HIVE-17063 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.2, 2.1.1, 2.2.0 >Reporter: Wang Haihua >Assignee: Wang Haihua > Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, > HIVE-17063.3.patch > > > The default value of {{hive.exec.stagingdir}} which is a relative path, and > also drop partition on a external table will not clear the real data. As a > result, insert overwrite partition twice will happen to fail because of the > target data to be moved has > already existed. > This happened when we reproduce partition data onto a external table. > I see the target data will not be cleared only when {{immediately generated > data}} is child of {{the target data directory}}, so my proposal is trying > to clear target file already existed finally whe doing rename {{immediately > generated data}} into {{the target data directory}} > Operation reproduced: > {code} > create external table insert_after_drop_partition(key string, val string) > partitioned by (insertdate string); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > alter table insert_after_drop_partition drop partition > (insertdate='2008-01-01'); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > {code} > Stack trace: > {code} > 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] > exec.Task: Failed with exception java.io.IOException: rename for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename > for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at >
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082711#comment-16082711 ] Vaibhav Gumashta commented on HIVE-4577: Thanks [~libing]. Looks like some test failures might need a look. > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082682#comment-16082682 ] Aihua Xu commented on HIVE-8838: [~szita] and [~sushanth] How about the tests related to TestHCatClient? Looks like those test failures are related? > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-8838: --- Attachment: HIVE-8838.3.patch > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Sushanth Sowmyan > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan reassigned HIVE-8838: -- Assignee: Sushanth Sowmyan (was: Adam Szita) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Sushanth Sowmyan > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan reassigned HIVE-8838: -- Assignee: Adam Szita (was: Sushanth Sowmyan) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082668#comment-16082668 ] Sushanth Sowmyan commented on HIVE-8838: +1 Thanks for adding parquet support to HCat! This has been a long time coming. :) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082672#comment-16082672 ] Sushanth Sowmyan commented on HIVE-8838: (also, note : I'm ignoring the reported test failures above, since they're all known to be flaky tests, and have been fixed elsewhere. However, we should run unit tests once more, in case there are other code changes in the last 10-ish days. Thus, I'm going to re-upload a .3.patch identical to the .2.patch so that ptest kicks off.) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16177: -- Attachment: HIVE-16177.18-branch-2.patch > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082649#comment-16082649 ] Pengcheng Xiong commented on HIVE-16907: thanks [~nemon] for discovering this and thanks [~libing] for the patch. However, it seems to me that although hive parse "`tdb.t1`" as a whole table name in AST, when it really processes it, it treats it as tdb.t1. Can u check other db's behavior, e.g., oracle and postgres, mysql for this? I doubt that there is a bug for table name when it contains "dot" in current hive. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator >
[jira] [Updated] (HIVE-17070) remove .orig files from src
[ https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17070: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) committed to master (3.0) thanks Jason for the review > remove .orig files from src > --- > > Key: HIVE-17070 > URL: https://issues.apache.org/jira/browse/HIVE-17070 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17070.patch > > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig > ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082605#comment-16082605 ] Matt McCline commented on HIVE-17073: - Also there is another forward at the top of the VectorSelectOperator.process method, too. > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17075) unstable stats in q files
[ https://issues.apache.org/jira/browse/HIVE-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-17075: - > unstable stats in q files > - > > Key: HIVE-17075 > URL: https://issues.apache.org/jira/browse/HIVE-17075 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Pengcheng Xiong > > stats recorded in explain plan in .out files are sometimes unstable > here is 1 concrete example (HIVE-15898 1st run is patch 6, 2nd run is patch 7) > {noformat} > [10:23 AM] Eugene Koifman: 1st run > [10:23 AM] Eugene Koifman: > https://builds.apache.org/job/PreCommit-HIVE-Build/5951/testReport/org.apache.hadoop.hive.cli/TestMi... > [10:23 AM] Eugene Koifman: 316c316 > < Statistics: Num rows: 45 Data size: 4560 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 45 Data size: 4571 Basic stats: > >COMPLETE Column stats: NONE > [10:23 AM] Eugene Koifman: this is the 1st diff - it says that actual result > was 4560 and expected was 4571 > [10:24 AM] Eugene Koifman: here is a 2nd run (the only difference is that I > update the .out file) > [10:24 AM] Eugene Koifman: > https://builds.apache.org/job/PreCommit-HIVE-Build/5956/testReport/org.apache.hadoop.hive.cli/TestMi... > [10:24 AM] Eugene Koifman: 316c316 > < Statistics: Num rows: 45 Data size: 4573 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 45 Data size: 4560 Basic stats: > >COMPLETE Column stats: NONE > [10:25 AM] Eugene Koifman: is the 1st diff in the 2nd run > {noformat} > The actual value from each run is different. > Complete output from patch 6 run > {noformat} > Client Execution succeeded but contained differences (error code = 1) after > executing sqlmerge_type2_scd.q > 316c316 > < Statistics: Num rows: 45 Data size: 4560 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 45 Data size: 4571 Basic stats: > > COMPLETE Column stats: NONE > 319c319 > < Statistics: Num rows: 45 Data size: 4560 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 45 Data size: 4571 Basic stats: > > COMPLETE Column stats: NONE > 324c324 > < Statistics: Num rows: 45 Data size: 4560 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 45 Data size: 4571 Basic stats: > > COMPLETE Column stats: NONE > 347c347 > < Statistics: Num rows: 22 Data size: 4560 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 22 Data size: 4571 Basic stats: > > COMPLETE Column stats: NONE > 350c350 > < Statistics: Num rows: 22 Data size: 4560 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 22 Data size: 4571 Basic stats: > > COMPLETE Column stats: NONE > 355c355 > < Statistics: Num rows: 22 Data size: 4560 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 22 Data size: 4571 Basic stats: > > COMPLETE Column stats: NONE > 369c369 > < Statistics: Num rows: 49 Data size: 5016 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 49 Data size: 5028 Basic stats: > > COMPLETE Column stats: NONE > 373c373 > < Statistics: Num rows: 49 Data size: 5016 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 49 Data size: 5028 Basic stats: > > COMPLETE Column stats: NONE > 378c378 > < Statistics: Num rows: 49 Data size: 5016 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 49 Data size: 5028 Basic stats: > > COMPLETE Column stats: NONE > 390c390 > < Statis > {noformat} > From patch 7 run > {noformat} > 316c316 > < Statistics: Num rows: 45 Data size: 4573 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 45 Data size: 4560 Basic stats: > > COMPLETE Column stats: NONE > 319c319 > < Statistics: Num rows: 45 Data size: 4573 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 45 Data size: 4560 Basic stats: > > COMPLETE Column stats: NONE > 324c324 > < Statistics: Num rows: 45 Data size: 4573 Basic stats: > COMPLETE Column stats: NONE > --- > > Statistics: Num rows: 45 Data size: 4560 Basic stats: > > COMPLETE Column stats: NONE > 347c347 > < Statistics: Num rows: 22 Data size: 4573 Basic stats: > COMPLETE
[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first
[ https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang Haihua updated HIVE-17063: --- Status: In Progress (was: Patch Available) > insert overwrite partition onto a external table fail when drop partition > first > --- > > Key: HIVE-17063 > URL: https://issues.apache.org/jira/browse/HIVE-17063 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.1, 1.2.2, 2.2.0 >Reporter: Wang Haihua >Assignee: Wang Haihua > Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, > HIVE-17063.3.patch > > > The default value of {{hive.exec.stagingdir}} which is a relative path, and > also drop partition on a external table will not clear the real data. As a > result, insert overwrite partition twice will happen to fail because of the > target data to be moved has > already existed. > This happened when we reproduce partition data onto a external table. > I see the target data will not be cleared only when {{immediately generated > data}} is child of {{the target data directory}}, so my proposal is trying > to clear target file already existed finally whe doing rename {{immediately > generated data}} into {{the target data directory}} > Operation reproduced: > {code} > create external table insert_after_drop_partition(key string, val string) > partitioned by (insertdate string); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > alter table insert_after_drop_partition drop partition > (insertdate='2008-01-01'); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > {code} > Stack trace: > {code} > 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] > exec.Task: Failed with exception java.io.IOException: rename for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename > for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:) > at > org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at >
[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first
[ https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang Haihua updated HIVE-17063: --- Status: Patch Available (was: In Progress) > insert overwrite partition onto a external table fail when drop partition > first > --- > > Key: HIVE-17063 > URL: https://issues.apache.org/jira/browse/HIVE-17063 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.1.1, 1.2.2, 2.2.0 >Reporter: Wang Haihua >Assignee: Wang Haihua > Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, > HIVE-17063.3.patch > > > The default value of {{hive.exec.stagingdir}} which is a relative path, and > also drop partition on a external table will not clear the real data. As a > result, insert overwrite partition twice will happen to fail because of the > target data to be moved has > already existed. > This happened when we reproduce partition data onto a external table. > I see the target data will not be cleared only when {{immediately generated > data}} is child of {{the target data directory}}, so my proposal is trying > to clear target file already existed finally whe doing rename {{immediately > generated data}} into {{the target data directory}} > Operation reproduced: > {code} > create external table insert_after_drop_partition(key string, val string) > partitioned by (insertdate string); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > alter table insert_after_drop_partition drop partition > (insertdate='2008-01-01'); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > {code} > Stack trace: > {code} > 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] > exec.Task: Failed with exception java.io.IOException: rename for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename > for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:) > at > org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at >
[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first
[ https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wang Haihua updated HIVE-17063: --- Attachment: HIVE-17063.3.patch > insert overwrite partition onto a external table fail when drop partition > first > --- > > Key: HIVE-17063 > URL: https://issues.apache.org/jira/browse/HIVE-17063 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.2.2, 2.1.1, 2.2.0 >Reporter: Wang Haihua >Assignee: Wang Haihua > Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, > HIVE-17063.3.patch > > > The default value of {{hive.exec.stagingdir}} which is a relative path, and > also drop partition on a external table will not clear the real data. As a > result, insert overwrite partition twice will happen to fail because of the > target data to be moved has > already existed. > This happened when we reproduce partition data onto a external table. > I see the target data will not be cleared only when {{immediately generated > data}} is child of {{the target data directory}}, so my proposal is trying > to clear target file already existed finally whe doing rename {{immediately > generated data}} into {{the target data directory}} > Operation reproduced: > {code} > create external table insert_after_drop_partition(key string, val string) > partitioned by (insertdate string); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > alter table insert_after_drop_partition drop partition > (insertdate='2008-01-01'); > from src insert overwrite table insert_after_drop_partition partition > (insertdate='2008-01-01') select *; > {code} > Stack trace: > {code} > 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] > exec.Task: Failed with exception java.io.IOException: rename for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename > for src path: > pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0 > to dest > path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0 > returned false > at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992) > at > org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461) > at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:) > at > org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120) > at > org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at >
[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082608#comment-16082608 ] Hive QA commented on HIVE-16907: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876643/HIVE-16907.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10838 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5960/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5960/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5960/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876643 - PreCommit-HIVE-Build > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > |
[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082593#comment-16082593 ] Matt McCline commented on HIVE-17073: - [~jcamachorodriguez] Thank you for jumping in with a solution. The invariant for a VectorizedRowBatch are that the selected array is always allocated. For efficiency, I think we want to pre-allocate a saveSelected array of VectorizedRowBatch.DEFAULT_SIZE elements in initializeOp. When # children > 1, then re-allocate that save array *only* if the vrb.size > than current array size. Use System.arraycopy into and out of saveSelected instead of Arrays.copyOf since the later method allocates a new object. > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17070) remove .orig files from src
[ https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082595#comment-16082595 ] Jason Dere commented on HIVE-17070: --- +1 > remove .orig files from src > --- > > Key: HIVE-17070 > URL: https://issues.apache.org/jira/browse/HIVE-17070 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Trivial > Attachments: HIVE-17070.patch > > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig > ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082519#comment-16082519 ] Eugene Koifman commented on HIVE-16732: --- patch 3 committed to master (3.0) > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, > HIVE-16732.03.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082479#comment-16082479 ] Hive QA commented on HIVE-16732: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876625/HIVE-16732.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10839 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5957/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5957/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5957/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876625 - PreCommit-HIVE-Build > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, > HIVE-16732.03.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie
[ https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082411#comment-16082411 ] Sahil Takiar commented on HIVE-15767: - Overall LGTM. Just a few questions: * Are these errors thrown by HiveServer2 or by the HoS Remote Driver? * Is that same thing required for Hive-on-MR? * Is it possible to add a test for this? > Hive On Spark is not working on secure clusters from Oozie > -- > > Key: HIVE-15767 > URL: https://issues.apache.org/jira/browse/HIVE-15767 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1, 2.1.1 >Reporter: Peter Cseh >Assignee: Peter Cseh > Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch > > > When a HiveAction is launched form Oozie with Hive On Spark enabled, we're > getting errors: > {noformat} > Caused by: java.io.IOException: Exception reading > file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155) > {noformat} > This is caused by passing the {{mapreduce.job.credentials.binary}} property > to the Spark configuration in RemoteHiveSparkClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082427#comment-16082427 ] Bing Li commented on HIVE-16907: [~pxiong] and [~ashutoshc] Could I get your comments on the patch? Thank you. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator > | > | compressed: false > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE
[jira] [Updated] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16907: --- Status: Patch Available (was: In Progress) > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.1.1, 1.1.0 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator > | > | compressed: false > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE | > | table:
[jira] [Updated] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16907: --- Attachment: HIVE-16907.1.patch The patch is created based on the latest master branch. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator > | > | compressed: false > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE
[jira] [Commented] (HIVE-15898) add Type2 SCD merge tests
[ https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082387#comment-16082387 ] Hive QA commented on HIVE-15898: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876624/HIVE-15898.07.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10839 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[zero_rows_blobstore] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_type2_scd] (batchId=144) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5956/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5956/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5956/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876624 - PreCommit-HIVE-Build > add Type2 SCD merge tests > - > > Key: HIVE-15898 > URL: https://issues.apache.org/jira/browse/HIVE-15898 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, > HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch, > HIVE-15898.06.patch, HIVE-15898.07.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly
[ https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082334#comment-16082334 ] Peter Vary commented on HIVE-16357: --- Hi [~zsombor.klara], Thanks for the patch! One nit: boolean successs - too much 's' :) 2 more interesting question: - Do we need to send out a notification about an unsuccessful event with empty list of tables? - The same change might be applied to the other events as well... [~zsombor.klara], [~mohitsabharwal]: What do you think? Thanks, Peter > Failed folder creation when creating a new table is reported incorrectly > > > Key: HIVE-16357 > URL: https://issues.apache.org/jira/browse/HIVE-16357 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.0, 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-16357.01.patch > > > If the directory for a Hive table could not be created, them the HMS will > throw a metaexception: > {code} > if (tblPath != null) { > if (!wh.isDir(tblPath)) { > if (!wh.mkdirs(tblPath, true)) { > throw new MetaException(tblPath > + " is not a directory or unable to create one"); > } > madeDir = true; > } > } > {code} > However in the finally block we always try to call the > DbNotificationListener, which in turn will also throw an exception because > the directory is missing, overwriting the initial exception with a > FileNotFoundException. > Actual stacktrace seen by the caller: > {code} > 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: > MetaException(message:java.lang.RuntimeException: > java.io.FileNotFoundException: File file:/.../0 does not exist) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File > file:/.../0 does not exist > at > org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203) > at > org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482) > ... 20 more > Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429) >
[jira] [Updated] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie
[ https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Cseh updated HIVE-15767: -- Attachment: HIVE-15767-002.patch Addressing typo > Hive On Spark is not working on secure clusters from Oozie > -- > > Key: HIVE-15767 > URL: https://issues.apache.org/jira/browse/HIVE-15767 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.2.1, 2.1.1 >Reporter: Peter Cseh >Assignee: Peter Cseh > Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch > > > When a HiveAction is launched form Oozie with Hive On Spark enabled, we're > getting errors: > {noformat} > Caused by: java.io.IOException: Exception reading > file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188) > at > org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155) > {noformat} > This is caused by passing the {{mapreduce.job.credentials.binary}} property > to the Spark configuration in RemoteHiveSparkClient. -- This message was sent by Atlassian JIRA (v6.4.14#64029)