[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083426#comment-16083426
 ] 

Hive QA commented on HIVE-16177:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876712/HIVE-16177.20-branch-2.patch

{color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10584 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228)
org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5971/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5971/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5971/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876712 - PreCommit-HIVE-Build

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, 
> HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)

2017-07-11 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083411#comment-16083411
 ] 

Chao Sun commented on HIVE-17018:
-

What I'm thinking is that the new config (say A) will be a value smaller than 
{{spark.executor.memory}} and will be divided among all tasks in the executor 
(so A / {{spark.executor.cores}}).
Another way is to specify A as the maximum hashtable memory for a single Spark 
task. This is the limit of the sum of the sizes for all hash tables in a single 
work (MapWork or ReduceWork). I think no change is needed for the code related 
to {{connectedMapJoinSize}}.

> Small table is converted to map join even the total size of small tables 
> exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
> -
>
> Key: HIVE-17018
> URL: https://issues.apache.org/jira/browse/HIVE-17018
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt
>
>
>  we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it 
> means  the sum of size for n-1 of the tables/partitions for a n-way join is 
> smaller than it, it will be converted to a map join. for example, A join B 
> join C join D join E. Big table is A(100M), small tables are 
> B(10M),C(10M),D(10M),E(10M).  If we set 
> hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B 
> will be converted to map join but C will not be converted to map join. In my 
> understanding, because hive.auto.convert.join.noconditionaltask.size can only 
> contain E and D, so C and B should not be converted to map join.  
> Let's explain more why E can be converted to map join.
> in current code, 
> [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364]
>  calculates all the mapjoins  in the parent path and child path. The search 
> stops when encountering [UnionOperator or 
> ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381].
>  Because C is not converted to map join because {{connectedMapJoinSize + 
> totalSize) > maxSize}} [see 
> code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The
>  RS before the join of C remains. When calculating whether B will be 
> converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering 
> [RS 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409]
>  and causes  {{connectedMapJoinSize + totalSize) < maxSize}} matches.
> [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not  as you 
> are more familiar with SparkJoinOptimizer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17069) Refactor OrcRawRecrodMerger.ReaderPair

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17069:
--
Attachment: HIVE-17069.01.patch

> Refactor OrcRawRecrodMerger.ReaderPair
> --
>
> Key: HIVE-17069
> URL: https://issues.apache.org/jira/browse/HIVE-17069
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17069.01.patch
>
>
> this should be done post HIVE-16177 so as not to obscure the functional 
> changes completely
> Make ReaderPair an interface
> ReaderPairImpl - will do what ReaderPair currently does, i.e. handle "normal" 
> code path
> OriginalReaderPair - same as now but w/o incomprehensible override/variable 
> shadowing logic.
> Perhaps split it into 2 - 1 for compaction 1 for "normal" read with common 
> base class.
> Push discoverKeyBounds() into appropriate implementation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16977) Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN

2017-07-11 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083403#comment-16083403
 ] 

Teddy Choi commented on HIVE-16977:
---

It looks like HIVE-16731 already solved this issue, too. It allows not only 
null values, but also other expressions.

> Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN
> --
>
> Key: HIVE-16977
> URL: https://issues.apache.org/jira/browse/HIVE-16977
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
>
> VectorUDFAdaptor(CASE WHEN ((_col2 > 0)) THEN ((UDFToDouble(_col3) / 
> UDFToDouble(_col2)) BETWEEN 0. AND 1.5) ...
> The expression in the THEN is not permitted.   Only columns or constants are 
> vectorized.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"

2017-07-11 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083365#comment-16083365
 ] 

Rui Li commented on HIVE-16922:
---

I'm +1 to fix the typo, though it needs to be marked as "breaking" change.
[~libing], have you checked the latest test failures? Is any of them related to 
your patch?

> Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
> ---
>
> Key: HIVE-16922
> URL: https://issues.apache.org/jira/browse/HIVE-16922
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Dudu Markovitz
>Assignee: Bing Li
> Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch
>
>
> https://github.com/apache/hive/blob/master/serde/if/serde.thrift
> Typo in serde.thrift: 
> COLLECTION_DELIM = "colelction.delim"
> (*colelction* instead of *collection*)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083377#comment-16083377
 ] 

Hive QA commented on HIVE-17066:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876732/HIVE-17066.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10825 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver
 (batchId=101)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5970/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5970/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5970/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876732 - PreCommit-HIVE-Build

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie

2017-07-11 Thread Yibing Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083374#comment-16083374
 ] 

Yibing Shi commented on HIVE-15767:
---

[~peterceluch], can the tokens in Oozie launcher application still be passed to 
Spark job when property {{mapreduce.job.credentials.binary}} is unset? For 
example, in an environment where HDFS transparent encryption is enabled, is 
Spark job still able to connect to KMS servers?

(The change is in {{RemoteHiveSparkClient}}. Hive on MR shouldn't be affected. 
Oozie actions have already make sure the tokens are added to action 
configuration, which then should be passed to MR jobs).

> Hive On Spark is not working on secure clusters from Oozie
> --
>
> Key: HIVE-15767
> URL: https://issues.apache.org/jira/browse/HIVE-15767
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Peter Cseh
>Assignee: Peter Cseh
> Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch
>
>
> When a HiveAction is launched form Oozie with Hive On Spark enabled, we're 
> getting errors:
> {noformat}
> Caused by: java.io.IOException: Exception reading 
> file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens
> at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155)
> {noformat}
> This is caused by passing the {{mapreduce.job.credentials.binary}} property 
> to the Spark configuration in RemoteHiveSparkClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16977) Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN

2017-07-11 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-16977:
-

Assignee: Teddy Choi

> Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN
> --
>
> Key: HIVE-16977
> URL: https://issues.apache.org/jira/browse/HIVE-16977
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
>
> VectorUDFAdaptor(CASE WHEN ((_col2 > 0)) THEN ((UDFToDouble(_col3) / 
> UDFToDouble(_col2)) BETWEEN 0. AND 1.5) ...
> The expression in the THEN is not permitted.   Only columns or constants are 
> vectorized.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-07-11 Thread Wang Haihua (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083357#comment-16083357
 ] 

Wang Haihua commented on HIVE-17063:


Fix some test error, now seems failed test result is not related with this 
patch.

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, 
> HIVE-17063.3.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at 

[jira] [Updated] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used

2017-07-11 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16975:
--
Attachment: HIVE-16975.1.patch

> Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is 
> now used
> -
>
> Key: HIVE-16975
> URL: https://issues.apache.org/jira/browse/HIVE-16975
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16975.1.patch
>
>
> Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used

2017-07-11 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi reassigned HIVE-16975:
-

Assignee: Teddy Choi

> Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is 
> now used
> -
>
> Key: HIVE-16975
> URL: https://issues.apache.org/jira/browse/HIVE-16975
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16975.1.patch
>
>
> Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-11 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083356#comment-16083356
 ] 

Rui Li commented on HIVE-16907:
---

I think we should firstly decide whether we allow "dot" in table names. I 
prefer disallowing it, because such names are confusing and seems we have 
already disallowed such column names via HIVE-10120. Then {{`tdb.t1`}} is 
considered as backtick quoted full qualified table name (useful when db/table 
name contains reserved keywords), and it should be treated as {{tdb.t1}} 
internally. Something like {{`tdb.t1.t2`}} should be disallowed.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator 

[jira] [Updated] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used

2017-07-11 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16975:
--
Attachment: (was: HIVE-16975.1.patch)

> Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is 
> now used
> -
>
> Key: HIVE-16975
> URL: https://issues.apache.org/jira/browse/HIVE-16975
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
>
> Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used

2017-07-11 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16975:
--
Attachment: HIVE-16975.1.patch

> Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is 
> now used
> -
>
> Key: HIVE-16975
> URL: https://issues.apache.org/jira/browse/HIVE-16975
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16975.1.patch
>
>
> Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used

2017-07-11 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-16975:
--
Status: Patch Available  (was: Open)

> Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is 
> now used
> -
>
> Key: HIVE-16975
> URL: https://issues.apache.org/jira/browse/HIVE-16975
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Teddy Choi
>Priority: Critical
> Attachments: HIVE-16975.1.patch
>
>
> Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1608#comment-1608
 ] 

Hive QA commented on HIVE-16832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876708/HIVE-16832.21.patch

{color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10853 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=60)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.ql.TestTxnCommands.testNonAcidToAcidConversion01 
(batchId=282)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion02 
(batchId=269)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testNonAcidToAcidConversion02
 (batchId=280)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion02
 (batchId=277)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5969/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5969/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5969/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876708 - PreCommit-HIVE-Build

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17065) You can not successfully deploy hive clusters with Hive guidance documents

2017-07-11 Thread ZhangBing Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083322#comment-16083322
 ] 

ZhangBing Lin commented on HIVE-17065:
--

Hi,[~leftylev],thank you for your suggest!

> You can not successfully deploy hive clusters with Hive guidance documents
> --
>
> Key: HIVE-17065
> URL: https://issues.apache.org/jira/browse/HIVE-17065
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: ZhangBing Lin
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> When I follow the official document from cwiki 
> [https://cwiki.apache.org/confluence/display/Hive/GettingStarted] to build 
> Hive2.1.1 single node service encountered several problems::
> 1, the following to create the HIVE warehouse directory needs to be modified
>   A $ HADOOP_HOME / bin / hadoop fs -mkdir /user/hive/warehouse
>   B $ HADOOP_HOME / bin / hadoop fs -mkdir -p /user/hive/warehouse
> Using B instead of A might be better
> 2, the following two description positions need to be adjusted
>  A.Running Hive CLI
> To use the Hive command line interface (CLI) from the shell:
>    $ $HIVE_HOME/bin/hive
>  B.Running HiveServer2 and Beeline
> Starting from Hive 2.1, we need to run the schematool command below as an 
> initialization step. For example, we can use "derby" as db type.
>    $ $HIVE_HOME/bin/schematool -dbType  -initSchema
> When I execute the $HIVE_HOME/bin/hive command, the following error occurs:
> !screenshot-1.png!
> When I execute the following order, and then the implementation of hive order 
> problem solving:
> $ HIVE_HOME/bin/schematool -dbType derby -initSchema



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16370) Avro data type null not supported on partitioned tables

2017-07-11 Thread Andrew Sears (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083316#comment-16083316
 ] 

Andrew Sears commented on HIVE-16370:
-

This is something that can be handled in Avro by unioning the null type with 
another type in the avro file.

[http://apache-avro.679487.n3.nabble.com/Support-for-null-in-String-primitive-types-td4025659.html]

ObjectInspectorUtils.java might be updated to handle "void" primitive category 
as it does in other cases.

> Avro data type null not supported on partitioned tables
> ---
>
> Key: HIVE-16370
> URL: https://issues.apache.org/jira/browse/HIVE-16370
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0, 2.1.1
>Reporter: rui miranda
>Priority: Minor
>
> I was attempting to create hive tables over some partitioned Avro files. It 
> seems the void data type (Avro null) is not supported on partitioned tables 
> (i could not replicate the bug on an un-partitioned table).
> ---
> i managed to replicate the bug on two different hive versions.
> Hive 1.1.0-cdh5.10.0
> Hive 2.1.1-amzn-0
> 
> how to replicate (avro tools are required to create the avro files):
> $ wget 
> http://mirror.serversupportforum.de/apache/avro/avro-1.8.1/java/avro-tools-1.8.1.jar
> $ mkdir /tmp/avro
> $ mkdir /tmp/avro/null
> $ echo "{ \
>   \"type\" : \"record\", \
>   \"name\" : \"null_failure\", \
>   \"namespace\" : \"org.apache.avro.null_failure\", \
>   \"doc\":\"the purpose of this schema is to replicate the hive avro null 
> failure\", \
>   \"fields\" : [{\"name\":\"one\", \"type\":\"null\",\"default\":null}] \
> } " > /tmp/avro/null/schema.avsc
> $ echo "{\"one\":null}" > /tmp/avro/null/data.json
> $ java -jar avro-tools-1.8.1.jar fromjson --schema-file 
> /tmp/avro/null/schema.avsc /tmp/avro/null/data.json > /tmp/avro/null/data.avro
> $ hdfs dfs -mkdir /tmp/avro
> $ hdfs dfs -mkdir /tmp/avro/null
> $ hdfs dfs -mkdir /tmp/avro/null/schema
> $ hdfs dfs -mkdir /tmp/avro/null/data
> $ hdfs dfs -mkdir /tmp/avro/null/data/foo=bar
> $ hdfs dfs -copyFromLocal /tmp/avro/null/schema.avsc 
> /tmp/avro/null/schema/schema.avsc
> $ hdfs dfs -copyFromLocal /tmp/avro/null/data.avro 
> /tmp/avro/null/data/foo=bar/data.avro
> $ hive 
> hive> CREATE EXTERNAL TABLE avro_null
> PARTITIONED BY (foo string)
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION
> '/tmp/avro/null/data/'
>   TBLPROPERTIES (
> 'avro.schema.url'='/tmp/avro/null/schema/schema.avsc')
> ;
> OK
> Time taken: 3.127 seconds
> hive> msck repair table avro_null;
> OK
> Partitions not in metastore:  avro_null:foo=bar
> Repair: Added partition to metastore avro_null:foo=bar
> Time taken: 0.712 seconds, Fetched: 2 row(s)
> hive> select * from avro_null;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
> hive> select foo, count(1)  from avro_null group by foo;
> OK
> bar   1
> Time taken: 29.806 seconds, Fetched: 1 row(s)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15705) Event replication for constraints

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083293#comment-16083293
 ] 

Hive QA commented on HIVE-15705:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876706/HIVE-15705.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5968/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5968/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5968/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876706 - PreCommit-HIVE-Build

> Event replication for constraints
> -
>
> Key: HIVE-15705
> URL: https://issues.apache.org/jira/browse/HIVE-15705
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, 
> HIVE-15705.3.patch, HIVE-15705.4.patch
>
>
> Make event replication for primary key and foreign key work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Open  (was: Patch Available)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Patch Available  (was: Open)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16955) General Improvements To org.apache.hadoop.hive.metastore.MetaStoreUtils

2017-07-11 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-16955:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Beluga!

> General Improvements To org.apache.hadoop.hive.metastore.MetaStoreUtils
> ---
>
> Key: HIVE-16955
> URL: https://issues.apache.org/jira/browse/HIVE-16955
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16955.1.patch, HIVE-16955.2.patch, 
> HIVE-16955.3.patch
>
>
> # Increase Code Reuse with {{Apache Commons}}
> # Improve debug logging (lowered to TRACE where appropriate)
> # Add optimizations for empty {{Collection}} scenarios
> # Better size {{ArrayList}} at instantiation
> # Use {{StringBuilder}} instead of String concatenation
> # Increase consistency of code style among similar methods
> # Decrease file line count



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Attachment: HIVE-17066.4.patch

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch, HIVE-17066.4.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17076) typo in itests/src/test/resources/testconfiguration.properties

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17076:
--
Status: Patch Available  (was: Open)

> typo in itests/src/test/resources/testconfiguration.properties
> --
>
> Key: HIVE-17076
> URL: https://issues.apache.org/jira/browse/HIVE-17076
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17076.01.patch
>
>
> it has 
> {noformat}
> minillap.shared.query.files=insert_into1.q,\
>   insert_into2.q,\
>   insert_values_orig_table.,\
>   llapdecider.q,\
> {noformat}
>  "insert_values_orig_table.,\" is a typo which causes these to be run with 
> TestCliDriver
> Note that there are 2 .q files that start with insert_values_orig_table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083251#comment-16083251
 ] 

Hive QA commented on HIVE-17066:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876709/HIVE-17066.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10839 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_semijoin_user_level]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[correlationoptimizer1]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_join_tests]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_joins_explain]
 (batchId=151)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5967/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5967/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5967/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876709 - PreCommit-HIVE-Build

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Patch Available  (was: Open)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-07-11 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083184#comment-16083184
 ] 

Jason Dere commented on HIVE-16926:
---

Maybe I can just replace pendingClients/registeredClients with a single list 
and the RequestInfo can keep a state to show if the request is 
pending/running/etc.
Correct, the shared umbilical server will not be shut down. Is there any action 
needed on this part? I don't think anything is exposed to shut it down.

> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16955) General Improvements To org.apache.hadoop.hive.metastore.MetaStoreUtils

2017-07-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083175#comment-16083175
 ] 

Ashutosh Chauhan commented on HIVE-16955:
-

+1

> General Improvements To org.apache.hadoop.hive.metastore.MetaStoreUtils
> ---
>
> Key: HIVE-16955
> URL: https://issues.apache.org/jira/browse/HIVE-16955
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HIVE-16955.1.patch, HIVE-16955.2.patch, 
> HIVE-16955.3.patch
>
>
> # Increase Code Reuse with {{Apache Commons}}
> # Improve debug logging (lowered to TRACE where appropriate)
> # Add optimizations for empty {{Collection}} scenarios
> # Better size {{ArrayList}} at instantiation
> # Use {{StringBuilder}} instead of String concatenation
> # Increase consistency of code style among similar methods
> # Decrease file line count



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083177#comment-16083177
 ] 

Hive QA commented on HIVE-17073:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876697/HIVE-17073.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_1] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_tablesample_rows] 
(batchId=49)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_1]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_2]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_window]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator
 (batchId=272)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5966/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5966/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5966/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876697 - PreCommit-HIVE-Build

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Patch Available  (was: Open)

Patch 3 addresses review comments

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083151#comment-16083151
 ] 

Ashutosh Chauhan commented on HIVE-17066:
-

+1 pending tests.

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17076) typo in itests/src/test/resources/testconfiguration.properties

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17076:
--
Attachment: HIVE-17076.01.patch

> typo in itests/src/test/resources/testconfiguration.properties
> --
>
> Key: HIVE-17076
> URL: https://issues.apache.org/jira/browse/HIVE-17076
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17076.01.patch
>
>
> it has 
> {noformat}
> minillap.shared.query.files=insert_into1.q,\
>   insert_into2.q,\
>   insert_values_orig_table.,\
>   llapdecider.q,\
> {noformat}
>  "insert_values_orig_table.,\" is a typo which causes these to be run with 
> TestCliDriver
> Note that there are 2 .q files that start with insert_values_orig_table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16979) Cache UGI for metastore

2017-07-11 Thread Tao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li updated HIVE-16979:
--
Attachment: HIVE-16979.3.patch

> Cache UGI for metastore
> ---
>
> Key: HIVE-16979
> URL: https://issues.apache.org/jira/browse/HIVE-16979
> Project: Hive
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
> Attachments: HIVE-16979.1.patch, HIVE-16979.2.patch, 
> HIVE-16979.3.patch
>
>
> FileSystem.closeAllForUGI is called per request against metastore to dispose 
> UGI, which involves talking to HDFS name node and is time consuming. So the 
> perf improvement would be caching and reusing the UGI.
> Per FileSystem.closeAllForUG call could take up to 20 ms as E2E latency 
> against HDFS. Usually a Hive query could result in several calls against 
> metastore, so we can save up to 50-100 ms per hive query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
Attachment: HIVE-16177.20-branch-2.patch

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, 
> HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Attachment: HIVE-17066.3.patch

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17076) typo in itests/src/test/resources/testconfiguration.properties

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17076:
-

Assignee: Eugene Koifman

> typo in itests/src/test/resources/testconfiguration.properties
> --
>
> Key: HIVE-17076
> URL: https://issues.apache.org/jira/browse/HIVE-17076
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> it has 
> {noformat}
> minillap.shared.query.files=insert_into1.q,\
>   insert_into2.q,\
>   insert_values_orig_table.,\
>   llapdecider.q,\
> {noformat}
>  "insert_values_orig_table.,\" is a typo which causes these to be run with 
> TestCliDriver
> Note that there are 2 .q files that start with insert_values_orig_table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Patch Available  (was: Open)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15705) Event replication for constraints

2017-07-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15705:
--
Attachment: HIVE-15705.4.patch

Resync with master.

> Event replication for constraints
> -
>
> Key: HIVE-15705
> URL: https://issues.apache.org/jira/browse/HIVE-15705
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, 
> HIVE-15705.3.patch, HIVE-15705.4.patch
>
>
> Make event replication for primary key and foreign key work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Attachment: HIVE-17066.3.patch

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Attachment: (was: HIVE-17066.3.patch)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.

2017-07-11 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083111#comment-16083111
 ] 

Siddharth Seth commented on HIVE-17019:
---

Thanks for posting the patch. Will be useful to get relevant data for a query.
- Change the top level package from llap-debug to tez-debug? (Works with both I 
believe) [~ashutoshc], [~thejas] - any recommendations on whether the code gets 
a top level module, or goes under an existing module. This allows downloading 
of various debug artifacts for a tez job - logs, metrics for llap, hiveserver2 
logs (soon), tez am logs, ATS data for the query (hive and tez).
- In the new pom.xml, dependency on hive-llap-server. 1) Is it required?, 2) 
Will need to exclude some dependent artifacts. See service/pom.xml llap-server 
dependency handling
- LogDownloadServlet - Should this throw an error as soon as the filename 
pattern validation fails?
- LogDownloadServlet - change to dagId/queryId validation instead
- LogDownloadServlet - thread being created inside of the request handler? This 
should be limited outside of the request? so that only a controlled number of 
parallel artifact downloads can run.
- LogDownloadServlet - what happens in case of aggregator failure? Exception 
back to the user?
- LogDownloadServlet - seems to be generating the file to disk and then 
streaming it over. Can this be streamed over directly instead. Otherwise 
there's the possibility of leaking files. (Artifact.downloadIntoStream or some 
such?) Guessing this is complicated further by the multi-threaded artifact 
downloader.
Alternately need to have a cleanup mechanism. 
- Timeout on the tests
- Apache header needs to be added to files where it is missing.
- Main - Please rename to something more indicative of what the tool does.
- Main - Likely a follow up jira - parse using a standard library, instead of 
trying to parse the arguments to main directly.
- Server - Enabling the artifact should be controlled via a config. Does not 
always need to be hosted in HS2 (Default disabled, at least till security can 
be sorted out)
- Is it possible to support a timeout on the downloads? (Can be a follow up 
jira)
- ArtifactAggregator - I believe this does 2 stages of dependent artifacts / 
downloads? Stage1 - download whatever it can. Information from this should 
should be adequate for stage2 downloads ?
- For the ones not implemented yet (DummyArtifact) - think it's better to just 
comment out the code, instead of invoking the DummyArtifacts downloader
- Security - ACL enforcement required on secure clusters to make sure users can 
only download what they have access to. This is a must fix before this can be 
enabled by default.
- Security - this can work around yarn restrictions on log downloads, since the 
files are being accessed by the hive user.
Could you please add some details on cluster testing.

> Add support to download debugging information as an archive.
> 
>
> Key: HIVE-17019
> URL: https://issues.apache.org/jira/browse/HIVE-17019
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: HIVE-17019.01.patch
>
>
> Given a queryId or dagId, get all information related to it: like, tez am, 
> task logs, hive ats data, tez ats data, slider am status, etc. Package it 
> into and archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Attachment: HIVE-17066.3.patch

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Open  (was: Patch Available)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16832:
--
Attachment: HIVE-16832.21.patch

patch 21 addresses Gopal's comments

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, 
> HIVE-16832.21.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Attachment: (was: HIVE-17066.3.patch)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15051) Test framework integration with findbugs, rat checks etc.

2017-07-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083050#comment-16083050
 ] 

Lefty Leverenz commented on HIVE-15051:
---

Thanks for the explanations, [~pvary].  I used your own wording and tinkered a 
bit with paragraph 3 -- feel free to re-edit whether for meaning or for style 
preferences.

* [Running Yetus | 
https://cwiki.apache.org/confluence/display/Hive/Running+Yetus]

> Test framework integration with findbugs, rat checks etc.
> -
>
> Key: HIVE-15051
> URL: https://issues.apache.org/jira/browse/HIVE-15051
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Peter Vary
>Assignee: Peter Vary
> Fix For: 3.0.0
>
> Attachments: beeline.out, HIVE-15051.02.patch, HIVE-15051.patch, 
> Interim.patch, ql.out
>
>
> Find a way to integrate code analysis tools like findbugs, rat checks to 
> PreCommit tests, thus removing the burden from reviewers to check the code 
> style and other checks which could be done by code. 
> Might worth to take a look on Yetus, but keep in mind the Hive has a specific 
> parallel test framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083089#comment-16083089
 ] 

Hive QA commented on HIVE-17073:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876697/HIVE-17073.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_decimal_1] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_tablesample_rows] 
(batchId=49)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_1]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_2]
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_window]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator
 (batchId=272)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5965/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5965/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5965/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876697 - PreCommit-HIVE-Build

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15144) JSON.org license is now CatX

2017-07-11 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15144:
---
Fix Version/s: 3.0.0
   2.3.0

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 2.2.0, 2.3.0, 3.0.0
>
> Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch, 
> HIVE-15144.patch
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15144) JSON.org license is now CatX

2017-07-11 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15144:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 2.2.0, 2.3.0, 3.0.0
>
> Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch, 
> HIVE-15144.patch
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Open  (was: Patch Available)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083077#comment-16083077
 ] 

Gopal V commented on HIVE-17073:


[~jcamachorodriguez]: the boolean can be replaced with the changes from 
HIVE-16821

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16751) Support different types for grouping columns in GroupBy Druid queries

2017-07-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083022#comment-16083022
 ] 

Lefty Leverenz commented on HIVE-16751:
---

Okay, thanks Jesús.

> Support different types for grouping columns in GroupBy Druid queries
> -
>
> Key: HIVE-16751
> URL: https://issues.apache.org/jira/browse/HIVE-16751
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-16751.patch
>
>
> Calcite 1.13 pushes EXTRACT and FLOOR function to Druid as an extraction 
> function (cf CALCITE-1758). Originally, we were assuming that all group by 
> columns in a druid query were of STRING type; however, this will not true 
> anymore (result of EXTRACT is an INT and result of FLOOR a TIMESTAMP).
> When we upgrade to Calcite 1.13, we will need to extend the DruidSerDe to 
> handle these functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan

2017-07-11 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-17066:
---
Status: Open  (was: Patch Available)

> Query78 filter wrong estimatation is generating bad plan
> 
>
> Key: HIVE-17066
> URL: https://issues.apache.org/jira/browse/HIVE-17066
> Project: Hive
>  Issue Type: Bug
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, 
> HIVE-17066.3.patch
>
>
> Filter operator is estimating 1 row following a left outer join causing bad 
> estimates
> {noformat}
> Reducer 12 
> Execution mode: vectorized, llap
> Reduce Operator Tree:
>   Map Join Operator
> condition map:
>  Left Outer Join0 to 1
> keys:
>   0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
>   1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 
> (type: bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, 
> _col8
> input vertices:
>   1 Map 14
> Statistics: Num rows: 71676270660 Data size: 3727166074320 
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
>   predicate: _col8 is null (type: boolean)
>   Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col0 (type: bigint), _col1 (type: bigint), 
> _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: 
> bigint)
> outputColumnNames: _col0, _col1, _col3, _col4, _col5, 
> _col6
> Statistics: Num rows: 1 Data size: 52 Basic stats: 
> COMPLETE Column stats: COMPLETE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)

2017-07-11 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083058#comment-16083058
 ] 

liyunzhang_intel commented on HIVE-17018:
-

[~csun]:  
{quote}A better way might be to have a separate config just for HoS, and maybe 
a limit on small table memory per executor.{quote}
what I confused is how to do this? the original code is to calculate whether 
the total mapjoin size in the same stage exceed the threshold or not.  Now 
create a new configure about calculating the threshold of all small tables 
according to the spark.executor.memory? If the total size of small tables in 
the same stage bigger than the spark.executor.memory, then not allow these 
small tables into the same stage but 
{{hive.auto.convert.join.nonconditionaltask.size}} is for caculating the total 
size of mapjoin size of small tables in the query? 

> Small table is converted to map join even the total size of small tables 
> exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
> -
>
> Key: HIVE-17018
> URL: https://issues.apache.org/jira/browse/HIVE-17018
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt
>
>
>  we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it 
> means  the sum of size for n-1 of the tables/partitions for a n-way join is 
> smaller than it, it will be converted to a map join. for example, A join B 
> join C join D join E. Big table is A(100M), small tables are 
> B(10M),C(10M),D(10M),E(10M).  If we set 
> hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B 
> will be converted to map join but C will not be converted to map join. In my 
> understanding, because hive.auto.convert.join.noconditionaltask.size can only 
> contain E and D, so C and B should not be converted to map join.  
> Let's explain more why E can be converted to map join.
> in current code, 
> [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364]
>  calculates all the mapjoins  in the parent path and child path. The search 
> stops when encountering [UnionOperator or 
> ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381].
>  Because C is not converted to map join because {{connectedMapJoinSize + 
> totalSize) > maxSize}} [see 
> code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The
>  RS before the join of C remains. When calculating whether B will be 
> converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering 
> [RS 
> |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409]
>  and causes  {{connectedMapJoinSize + totalSize) < maxSize}} matches.
> [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not  as you 
> are more familiar with SparkJoinOptimizer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17065) You can not successfully deploy hive clusters with Hive guidance documents

2017-07-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083012#comment-16083012
 ] 

Lefty Leverenz commented on HIVE-17065:
---

[~linzhangbing], it's easy to get edit privileges for the Hive wiki:

* [About This Wiki -- How to get permission to edit | 
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit]
* [About This Wiki -- How to edit the Hive wiki | 
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-HowtoedittheHivewiki]

But the modifications required are beyond my expertise.

> You can not successfully deploy hive clusters with Hive guidance documents
> --
>
> Key: HIVE-17065
> URL: https://issues.apache.org/jira/browse/HIVE-17065
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: ZhangBing Lin
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> When I follow the official document from cwiki 
> [https://cwiki.apache.org/confluence/display/Hive/GettingStarted] to build 
> Hive2.1.1 single node service encountered several problems::
> 1, the following to create the HIVE warehouse directory needs to be modified
>   A $ HADOOP_HOME / bin / hadoop fs -mkdir /user/hive/warehouse
>   B $ HADOOP_HOME / bin / hadoop fs -mkdir -p /user/hive/warehouse
> Using B instead of A might be better
> 2, the following two description positions need to be adjusted
>  A.Running Hive CLI
> To use the Hive command line interface (CLI) from the shell:
>    $ $HIVE_HOME/bin/hive
>  B.Running HiveServer2 and Beeline
> Starting from Hive 2.1, we need to run the schematool command below as an 
> initialization step. For example, we can use "derby" as db type.
>    $ $HIVE_HOME/bin/schematool -dbType  -initSchema
> When I execute the $HIVE_HOME/bin/hive command, the following error occurs:
> !screenshot-1.png!
> When I execute the following order, and then the implementation of hive order 
> problem solving:
> $ HIVE_HOME/bin/schematool -dbType derby -initSchema



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083011#comment-16083011
 ] 

Hive QA commented on HIVE-17073:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876680/HIVE-17073.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator
 (batchId=272)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5964/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5964/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5964/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876680 - PreCommit-HIVE-Build

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082920#comment-16082920
 ] 

Matt McCline commented on HIVE-17073:
-

Ok, LGTM +1 tests pending.

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request

2017-07-11 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082995#comment-16082995
 ] 

Siddharth Seth commented on HIVE-16926:
---

Functionally, looks good to me. Minor comments.
- umbilicalServer.umbilicalProtocol.pendingClients.putIfAbsent -> Would be a 
little cleaner to add a method for this, similar to unregisterClient.
- {code} +  for (String key : umbilicalImpl.pendingClients.keySet()) {
+LlapTaskUmbilicalExternalClient client = 
umbilicalImpl.pendingClients.get(key);{code}
Replace with an iterator over the entrySet to avoid the get() ?
Also, this pattern is repeated in hearbeat and nodeHeartbeat - could likely be 
a method.

If I'm not mistaken, the shared umbilical server will not be shut down ever?

Maybe in a follow up - some of the static classes could be split out.


> LlapTaskUmbilicalExternalClient should not start new umbilical server for 
> every fragment request
> 
>
> Key: HIVE-16926
> URL: https://issues.apache.org/jira/browse/HIVE-16926
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, 
> HIVE-16926.3.patch, HIVE-16926.4.patch
>
>
> Followup task from [~sseth] and [~sershe] after HIVE-16777.
> LlapTaskUmbilicalExternalClient currently creates a new umbilical server for 
> every fragment request, but this is not necessary and the umbilical can be 
> shared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15144) JSON.org license is now CatX

2017-07-11 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082980#comment-16082980
 ] 

Lefty Leverenz commented on HIVE-15144:
---

[~pxiong], please update the fix versions to 2.3.0 and 3.0.0.  Thanks.

> JSON.org license is now CatX
> 
>
> Key: HIVE-15144
> URL: https://issues.apache.org/jira/browse/HIVE-15144
> Project: Hive
>  Issue Type: Bug
>Reporter: Robert Kanter
>Assignee: Owen O'Malley
>Priority: Blocker
> Fix For: 2.2.0
>
> Attachments: HIVE-15144.patch, HIVE-15144.patch, HIVE-15144.patch, 
> HIVE-15144.patch
>
>
> per [update resolved legal|http://www.apache.org/legal/resolved.html#json]:
> {quote}
> CAN APACHE PRODUCTS INCLUDE WORKS LICENSED UNDER THE JSON LICENSE?
> No. As of 2016-11-03 this has been moved to the 'Category X' license list. 
> Prior to this, use of the JSON Java library was allowed. See Debian's page 
> for a list of alternatives.
> {quote}
> I'm not sure when this dependency was first introduced, but it looks like 
> it's currently used in a few places:
> https://github.com/apache/hive/search?p=1=%22org.json%22=%E2%9C%93



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie

2017-07-11 Thread Peter Cseh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082948#comment-16082948
 ] 

Peter Cseh commented on HIVE-15767:
---

This happens with HiveCLI, not with HS2. 
The exception is coming from the spark driver.

When the HiveCLI is executed from shell, the mapreduce.job.credentials.binary 
is empty in the configuration as spark-submit is called from the RemoteClient.
When it's executed from Oozie's LauncherMapper, Hive picks up this property 
from the Oozie launcher's configuration which is correct, but passes it to 
Spark. Spark runs in yarn-cluster mode so the Spark driver gets it's own 
container (which may be on an other machine). It look for the credential files 
in the folder where the Oozie Launcher ran. That's on a different machine, so 
it can't pick up the conatiner_tokens file which leaves the spark driver with 
no tokens so it fails.

I don't know how Hive-on-MR works in this regards, but we had no similar issues 
with the HiveAction before, so I assume it works differently.

I don't think it's possible to reproduce it using MiniClusters as the local 
folders will be available in the test so the Spark driver will be able to 
access it. 

> Hive On Spark is not working on secure clusters from Oozie
> --
>
> Key: HIVE-15767
> URL: https://issues.apache.org/jira/browse/HIVE-15767
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Peter Cseh
>Assignee: Peter Cseh
> Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch
>
>
> When a HiveAction is launched form Oozie with Hive On Spark enabled, we're 
> getting errors:
> {noformat}
> Caused by: java.io.IOException: Exception reading 
> file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens
> at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155)
> {noformat}
> This is caused by passing the {{mapreduce.job.credentials.binary}} property 
> to the Spark configuration in RemoteHiveSparkClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table

2017-07-11 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082885#comment-16082885
 ] 

Gopal V commented on HIVE-16832:


bq. Suppose you populate a partition via 100 inserts and 1M rows. So you have 
100 OTIDs.

Yeah, this was an optimization for the possibility that you're doing an "update 
every row" merge which would otherwise cause a massive memory jump in deletes 
(& overflow the 2G limit on arrays).

bq. Perhaps simply relying on the "push down" to delete deltas is enough and we 
are better off just keeping 3 arrays

Yes, it might be better - I've yet to really look into the delete distribution 
for a regular CDC workload. The push-down into deletes is a big win anyway.

Not too worried about the extra size here.

> duplicate ROW__ID possible in multi insert into transactional table
> ---
>
> Key: HIVE-16832
> URL: https://issues.apache.org/jira/browse/HIVE-16832
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, 
> HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, 
> HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, 
> HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, 
> HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, 
> HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch
>
>
> {noformat}
>  create table AcidTablePart(a int, b int) partitioned by (p string) clustered 
> by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');
>  create temporary table if not exists data1 (x int);
>  insert into data1 values (1);
>  from data1
>insert into AcidTablePart partition(p) select 0, 0, 'p' || x
>insert into AcidTablePart partition(p='p1') select 0, 1
> {noformat}
> Each branch of this multi-insert create a row in partition p1/bucket0 with 
> ROW__ID=(1,0,0).
> The same can happen when running SQL Merge (HIVE-10924) statement that has 
> both Insert and Update clauses when target table has 
> _'transactional'='true','transactional_properties'='default'_  (see 
> HIVE-14035).  This is so because Merge is internally run as a multi-insert 
> statement.
> The solution relies on statement ID introduced in HIVE-11030.  Each Insert 
> clause of a multi-insert is gets a unique ID.
> The ROW__ID.bucketId now becomes a bit packed triplet (format version, 
> bucketId, statementId).
> (Since ORC stores field names in the data file we can't rename 
> ROW__ID.bucketId).
> This ensures that there are no collisions and retains desired sort properties 
> of ROW__ID.
> In particular _SortedDynPartitionOptimizer_ works w/o any changes even in 
> cases where there fewer reducers than buckets.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082852#comment-16082852
 ] 

Matt McCline edited comment on HIVE-17073 at 7/11/17 7:49 PM:
--

Great work!

I think you need to add some code to TableScanOperator -- it handles 
VectorizedRowBatch as pass-through, too.  It has a forward call in it.  
Probably add an instanceof check at beginning of method and use it.

And, LLAP drives in VRBs, too.  Not sure where at the moment.  Might just be 
via InputFileFormat.




was (Author: mmccline):
Great work!

I think you need to add some code to TableScanOperator -- it handles 
VectorizedRowBatch as pass-through, too.  It has a forward call in it.  
Probably add an instanceof check at beginning of method and use it.

And, LLAP drives in VRBs, too.  Not sure where at the moment.



> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17073:
---
Attachment: HIVE-17073.02.patch

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, 
> HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082904#comment-16082904
 ] 

Hive QA commented on HIVE-8838:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876668/HIVE-8838.3.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10873 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5963/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5963/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5963/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876668 - PreCommit-HIVE-Build

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
Attachment: HIVE-16177.19-branch-2.patch

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, 
> HIVE-16177.19-branch-2.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082852#comment-16082852
 ] 

Matt McCline commented on HIVE-17073:
-

Great work!

I think you need to add some code to TableScanOperator -- it handles 
VectorizedRowBatch as pass-through, too.  It has a forward call in it.  
Probably add an instanceof check at beginning of method and use it.

And, LLAP drives in VRBs, too.  Not sure where at the moment.



> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-11 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082858#comment-16082858
 ] 

Aihua Xu commented on HIVE-8838:


Thanks [~szita] The patch looks good to me. +1.

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-11 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082847#comment-16082847
 ] 

Adam Szita commented on HIVE-8838:
--

[~aihuaxu] those are unrelated, they are flaky/failing without this patch too, 
see e.g this build from yesterday: 
https://builds.apache.org/job/PreCommit-HIVE-Build/5937/testReport/

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082814#comment-16082814
 ] 

Hive QA commented on HIVE-16177:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876662/HIVE-16177.18-branch-2.patch

{color:green}SUCCESS:{color} +1 due to 9 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10584 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] 
(batchId=139)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs]
 (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=125)
org.apache.hadoop.hive.ql.TestTxnCommands.testNonAcidToAcidConversion01 
(batchId=278)
org.apache.hadoop.hive.ql.TestTxnCommands2.testNonAcidToAcidConversion02 
(batchId=266)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdate.testNonAcidToAcidConversion02
 (batchId=276)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion02
 (batchId=273)
org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228)
org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition 
(batchId=217)
org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5962/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5962/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5962/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876662 - PreCommit-HIVE-Build

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is 

[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17073:
---
Attachment: (was: HIVE-17073.01.patch)

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17073:
---
Attachment: HIVE-17073.01.patch

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17073:
---
Attachment: HIVE-17073.01.patch

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.01.patch, HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082722#comment-16082722
 ] 

Hive QA commented on HIVE-17063:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876658/HIVE-17063.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10840 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5961/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5961/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5961/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876658 - PreCommit-HIVE-Build

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, 
> HIVE-17063.3.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> 

[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.

2017-07-11 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082711#comment-16082711
 ] 

Vaibhav Gumashta commented on HIVE-4577:


Thanks [~libing]. Looks like some test failures might need a look. 

> hive CLI can't handle hadoop dfs command  with space and quotes.
> 
>
> Key: HIVE-4577
> URL: https://issues.apache.org/jira/browse/HIVE-4577
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, 
> HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch
>
>
> As design, hive could support hadoop dfs command in hive shell, like 
> hive> dfs -mkdir /user/biadmin/mydir;
> but has different behavior with hadoop if the path contains space and quotes
> hive> dfs -mkdir "hello"; 
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:40 
> /user/biadmin/"hello"
> hive> dfs -mkdir 'world';
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:43 
> /user/biadmin/'world'
> hive> dfs -mkdir "bei jing";
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/"bei
> drwxr-xr-x   - biadmin supergroup  0 2013-04-23 09:44 
> /user/biadmin/jing"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-11 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082682#comment-16082682
 ] 

Aihua Xu commented on HIVE-8838:


[~szita] and [~sushanth] How about the tests related to TestHCatClient? Looks 
like those test failures are related?

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog

2017-07-11 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-8838:
---
Attachment: HIVE-8838.3.patch

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-8838) Support Parquet through HCatalog

2017-07-11 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan reassigned HIVE-8838:
--

Assignee: Sushanth Sowmyan  (was: Adam Szita)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-8838) Support Parquet through HCatalog

2017-07-11 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan reassigned HIVE-8838:
--

Assignee: Adam Szita  (was: Sushanth Sowmyan)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, 
> HIVE-8838.3.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-11 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082668#comment-16082668
 ] 

Sushanth Sowmyan commented on HIVE-8838:


+1

Thanks for adding parquet support to HCat! This has been a long time coming. :)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog

2017-07-11 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082672#comment-16082672
 ] 

Sushanth Sowmyan commented on HIVE-8838:


(also, note : I'm ignoring the reported test failures above, since they're all 
known to be flaky tests, and have been fixed elsewhere. However, we should run 
unit tests once more, in case there are other code changes in the last 10-ish 
days. Thus, I'm going to re-upload a .3.patch identical to the .2.patch so that 
ptest kicks off.)

> Support Parquet through HCatalog
> 
>
> Key: HIVE-8838
> URL: https://issues.apache.org/jira/browse/HIVE-8838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Adam Szita
> Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch
>
>
> Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16177:
--
Attachment: HIVE-16177.18-branch-2.patch

> non Acid to acid conversion doesn't handle _copy_N files
> 
>
> Key: HIVE-16177
> URL: https://issues.apache.org/jira/browse/HIVE-16177
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, 
> HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, 
> HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, 
> HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, 
> HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch
>
>
> {noformat}
> create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc 
> TBLPROPERTIES('transactional'='false')
> insert into T(a,b) values(1,2)
> insert into T(a,b) values(1,3)
> alter table T SET TBLPROPERTIES ('transactional'='true')
> {noformat}
> //we should now have bucket files 01_0 and 01_0_copy_1
> but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can 
> be copy_N files and numbers rows in each bucket from 0 thus generating 
> duplicate IDs
> {noformat}
> select ROW__ID, INPUT__FILE__NAME, a, b from T
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2
> {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3
> {noformat}
> [~owen.omalley], do you have any thoughts on a good way to handle this?
> attached patch has a few changes to make Acid even recognize copy_N but this 
> is just a pre-requisite.  The new UT demonstrates the issue.
> Futhermore,
> {noformat}
> alter table T compact 'major'
> select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
> {noformat}
> produces 
> {noformat}
> {"transactionid":0,"bucketid":1,"rowid":0}
> file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1
> 1   2
> {noformat}
> HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() 
> demonstrating this
> This is because compactor doesn't handle copy_N files either (skips them)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-11 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082649#comment-16082649
 ] 

Pengcheng Xiong commented on HIVE-16907:


thanks [~nemon] for discovering this and thanks [~libing] for the patch. 
However, it seems to me that although hive parse "`tdb.t1`" as a whole table 
name in AST, when it really processes it, it treats it as tdb.t1. Can u check 
other db's behavior, e.g., oracle and postgres, mysql for this? I doubt that 
there is a bug for table name when it contains "dot" in current hive.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   

[jira] [Updated] (HIVE-17070) remove .orig files from src

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17070:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master (3.0)
thanks Jason for the review

> remove .orig files from src
> ---
>
> Key: HIVE-17070
> URL: https://issues.apache.org/jira/browse/HIVE-17070
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HIVE-17070.patch
>
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
> ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082605#comment-16082605
 ] 

Matt McCline commented on HIVE-17073:
-

Also there is another forward at the top of the VectorSelectOperator.process 
method, too.

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17075) unstable stats in q files

2017-07-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17075:
-


> unstable stats in q files
> -
>
> Key: HIVE-17075
> URL: https://issues.apache.org/jira/browse/HIVE-17075
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Pengcheng Xiong
>
> stats recorded in explain plan in .out files are sometimes unstable 
> here is 1 concrete example (HIVE-15898 1st run is patch 6, 2nd run is patch 7)
> {noformat}
> [10:23 AM] Eugene Koifman: 1st run
> [10:23 AM] Eugene Koifman: 
> https://builds.apache.org/job/PreCommit-HIVE-Build/5951/testReport/org.apache.hadoop.hive.cli/TestMi...
> [10:23 AM] Eugene Koifman: 316c316
> <   Statistics: Num rows: 45 Data size: 4560 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >   Statistics: Num rows: 45 Data size: 4571 Basic stats: 
> >COMPLETE Column stats: NONE
> [10:23 AM] Eugene Koifman: this is the 1st diff - it says that actual result 
> was 4560 and expected was 4571
> [10:24 AM] Eugene Koifman: here is a 2nd run (the only difference is that I 
> update the .out file)
> [10:24 AM] Eugene Koifman: 
> https://builds.apache.org/job/PreCommit-HIVE-Build/5956/testReport/org.apache.hadoop.hive.cli/TestMi...
> [10:24 AM] Eugene Koifman: 316c316
> <   Statistics: Num rows: 45 Data size: 4573 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >   Statistics: Num rows: 45 Data size: 4560 Basic stats: 
> >COMPLETE Column stats: NONE
> [10:25 AM] Eugene Koifman: is the 1st diff in the 2nd run
> {noformat}
> The actual value from each run is different.
> Complete output from patch 6 run 
> {noformat}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing sqlmerge_type2_scd.q 
> 316c316
> <   Statistics: Num rows: 45 Data size: 4560 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >   Statistics: Num rows: 45 Data size: 4571 Basic stats: 
> > COMPLETE Column stats: NONE
> 319c319
> < Statistics: Num rows: 45 Data size: 4560 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> > Statistics: Num rows: 45 Data size: 4571 Basic stats: 
> > COMPLETE Column stats: NONE
> 324c324
> <   Statistics: Num rows: 45 Data size: 4560 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >   Statistics: Num rows: 45 Data size: 4571 Basic stats: 
> > COMPLETE Column stats: NONE
> 347c347
> <   Statistics: Num rows: 22 Data size: 4560 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >   Statistics: Num rows: 22 Data size: 4571 Basic stats: 
> > COMPLETE Column stats: NONE
> 350c350
> < Statistics: Num rows: 22 Data size: 4560 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> > Statistics: Num rows: 22 Data size: 4571 Basic stats: 
> > COMPLETE Column stats: NONE
> 355c355
> <   Statistics: Num rows: 22 Data size: 4560 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >   Statistics: Num rows: 22 Data size: 4571 Basic stats: 
> > COMPLETE Column stats: NONE
> 369c369
> < Statistics: Num rows: 49 Data size: 5016 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> > Statistics: Num rows: 49 Data size: 5028 Basic stats: 
> > COMPLETE Column stats: NONE
> 373c373
> <   Statistics: Num rows: 49 Data size: 5016 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >   Statistics: Num rows: 49 Data size: 5028 Basic stats: 
> > COMPLETE Column stats: NONE
> 378c378
> < Statistics: Num rows: 49 Data size: 5016 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> > Statistics: Num rows: 49 Data size: 5028 Basic stats: 
> > COMPLETE Column stats: NONE
> 390c390
> < Statis
> {noformat}
> From patch 7 run 
> {noformat}
> 316c316
> <   Statistics: Num rows: 45 Data size: 4573 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >   Statistics: Num rows: 45 Data size: 4560 Basic stats: 
> > COMPLETE Column stats: NONE
> 319c319
> < Statistics: Num rows: 45 Data size: 4573 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> > Statistics: Num rows: 45 Data size: 4560 Basic stats: 
> > COMPLETE Column stats: NONE
> 324c324
> <   Statistics: Num rows: 45 Data size: 4573 Basic stats: 
> COMPLETE Column stats: NONE
> ---
> >   Statistics: Num rows: 45 Data size: 4560 Basic stats: 
> > COMPLETE Column stats: NONE
> 347c347
> <   Statistics: Num rows: 22 Data size: 4573 Basic stats: 
> COMPLETE 

[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-07-11 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-17063:
---
Status: In Progress  (was: Patch Available)

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.1, 1.2.2, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, 
> HIVE-17063.3.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> 

[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-07-11 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-17063:
---
Status: Patch Available  (was: In Progress)

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.1.1, 1.2.2, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, 
> HIVE-17063.3.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> 

[jira] [Updated] (HIVE-17063) insert overwrite partition onto a external table fail when drop partition first

2017-07-11 Thread Wang Haihua (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Haihua updated HIVE-17063:
---
Attachment: HIVE-17063.3.patch

> insert overwrite partition onto a external table fail when drop partition 
> first
> ---
>
> Key: HIVE-17063
> URL: https://issues.apache.org/jira/browse/HIVE-17063
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.2.2, 2.1.1, 2.2.0
>Reporter: Wang Haihua
>Assignee: Wang Haihua
> Attachments: HIVE-17063.1.patch, HIVE-17063.2.patch, 
> HIVE-17063.3.patch
>
>
> The default value of {{hive.exec.stagingdir}} which is a relative path, and 
> also drop partition on a external table will not clear the real data. As a 
> result, insert overwrite partition twice will happen to fail because of the 
> target data to be moved has 
>  already existed.
> This happened when we reproduce partition data onto a external table. 
> I see the target data will not be cleared only when {{immediately generated 
> data}} is child of {{the target data directory}}, so my proposal is trying  
> to clear target file already existed finally whe doing rename  {{immediately 
> generated data}} into {{the target data directory}}
> Operation reproduced:
> {code}
> create external table insert_after_drop_partition(key string, val string) 
> partitioned by (insertdate string);
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> alter table insert_after_drop_partition drop partition 
> (insertdate='2008-01-01');
> from src insert overwrite table insert_after_drop_partition partition 
> (insertdate='2008-01-01') select *;
> {code}
> Stack trace:
> {code}
> 2017-07-09T08:32:05,212 ERROR [f3bc51c8-2441-4689-b1c1-d60aef86c3aa main] 
> exec.Task: Failed with exception java.io.IOException: rename for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: rename 
> for src path: 
> pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/.hive-staging_hive_2017-07-09_08-32-03_840_4046825276907030554-1/-ext-1/00_0
>  to dest 
> path:pfile:/data/haihua/official/hive/itests/qtest/target/warehouse/insert_after_drop_partition/insertdate=2008-01-01/00_0
>  returned false
> at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2992)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3248)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1532)
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1461)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:498)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1137)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:120)
> at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_after_drop_partition(TestCliDriver.java:103)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> 

[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082608#comment-16082608
 ] 

Hive QA commented on HIVE-16907:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876643/HIVE-16907.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10838 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5960/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5960/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5960/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876643 - PreCommit-HIVE-Build

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> |  

[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer

2017-07-11 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082593#comment-16082593
 ] 

Matt McCline commented on HIVE-17073:
-

[~jcamachorodriguez] Thank you for jumping in with a solution.

The invariant for a VectorizedRowBatch are that the selected array is always 
allocated.

For efficiency, I think we want to pre-allocate a saveSelected array of 
VectorizedRowBatch.DEFAULT_SIZE elements in initializeOp.  When # children > 1, 
then re-allocate that save array *only* if the vrb.size > than current array 
size.  Use System.arraycopy into and out of saveSelected instead of 
Arrays.copyOf since the later method allocates a new object.

> Incorrect result with vectorization and SharedWorkOptimizer
> ---
>
> Key: HIVE-17073
> URL: https://issues.apache.org/jira/browse/HIVE-17073
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17073.patch
>
>
> We get incorrect result with vectorization and multi-output Select operator 
> created by SharedWorkOptimizer. It can be reproduced in the following way.
> {code:title=Correct}
> select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278";
> OK
> 2
> {code}
> {code:title=Correct}
> select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255";
> OK
> 2
> {code}
> {code:title=Incorrect}
> select * from (
>   select count(*) as h8_30_to_9
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_278") s1
> join (
>   select count(*) as h9_to_9_30
>   from src
>   join src1 on src.key = src1.key
>   where src1.value = "val_255") s2;
> OK
> 2 0
> {code}
> Problem seems to be that some ds in the batch row need to be re-initialized 
> after they have been forwarded to each output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17070) remove .orig files from src

2017-07-11 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082595#comment-16082595
 ] 

Jason Dere commented on HIVE-17070:
---

+1

> remove .orig files from src
> ---
>
> Key: HIVE-17070
> URL: https://issues.apache.org/jira/browse/HIVE-17070
> Project: Hive
>  Issue Type: Bug
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Trivial
> Attachments: HIVE-17070.patch
>
>
> common/src/java/org/apache/hadoop/hive/conf/HiveConf.java.orig
> ql/src/test/results/clientpositive/llap/vector_join30.q.out.orig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-11 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082519#comment-16082519
 ] 

Eugene Koifman commented on HIVE-16732:
---

patch 3 committed to master (3.0)

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, 
> HIVE-16732.03.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082479#comment-16082479
 ] 

Hive QA commented on HIVE-16732:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876625/HIVE-16732.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10839 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=157)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5957/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5957/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5957/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876625 - PreCommit-HIVE-Build

> Transactional tables should block LOAD DATA 
> 
>
> Key: HIVE-16732
> URL: https://issues.apache.org/jira/browse/HIVE-16732
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, 
> HIVE-16732.03.patch
>
>
> This has always been the design.
> see LoadSemanticAnalyzer.analyzeInternal()
> StrictChecks.checkBucketing(conf);
> Some examples (this is exposed by HIVE-16177)
> insert_values_orig_table.q
>  insert_orig_table.q
>  insert_values_orig_table_use_metadata.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie

2017-07-11 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082411#comment-16082411
 ] 

Sahil Takiar commented on HIVE-15767:
-

Overall LGTM. Just a few questions:
* Are these errors thrown by HiveServer2 or by the HoS Remote Driver?
* Is that same thing required for Hive-on-MR?
* Is it possible to add a test for this?

> Hive On Spark is not working on secure clusters from Oozie
> --
>
> Key: HIVE-15767
> URL: https://issues.apache.org/jira/browse/HIVE-15767
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Peter Cseh
>Assignee: Peter Cseh
> Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch
>
>
> When a HiveAction is launched form Oozie with Hive On Spark enabled, we're 
> getting errors:
> {noformat}
> Caused by: java.io.IOException: Exception reading 
> file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens
> at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155)
> {noformat}
> This is caused by passing the {{mapreduce.job.credentials.binary}} property 
> to the Spark configuration in RemoteHiveSparkClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-11 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082427#comment-16082427
 ] 

Bing Li commented on HIVE-16907:


[~pxiong] and [~ashutoshc] Could I get your comments on the patch? Thank you.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE

[jira] [Updated] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-11 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16907:
---
Status: Patch Available  (was: In Progress)

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.1.1, 1.1.0
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE |
> | table:  

[jira] [Updated] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote

2017-07-11 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-16907:
---
Attachment: HIVE-16907.1.patch

The patch is created based on the latest master branch.

>  "INSERT INTO"  overwrite old data when destination table encapsulated by 
> backquote 
> 
>
> Key: HIVE-16907
> URL: https://issues.apache.org/jira/browse/HIVE-16907
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.1.0, 2.1.1
>Reporter: Nemon Lou
>Assignee: Bing Li
> Attachments: HIVE-16907.1.patch
>
>
> A way to reproduce:
> {noformat}
> create database tdb;
> use tdb;
> create table t1(id int);
> create table t2(id int);
> explain insert into `tdb.t1` select * from t2;
> {noformat}
> {noformat}
> +---+
> |  
> Explain  |
> +---+
> | STAGE DEPENDENCIES: 
>   |
> |   Stage-1 is a root stage   
>   |
> |   Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, 
> Stage-4  |
> |   Stage-3   
>   |
> |   Stage-0 depends on stages: Stage-3, Stage-2, Stage-5  
>   |
> |   Stage-2   
>   |
> |   Stage-4   
>   |
> |   Stage-5 depends on stages: Stage-4
>   |
> | 
>   |
> | STAGE PLANS:
>   |
> |   Stage: Stage-1
>   |
> | Map Reduce  
>   |
> |   Map Operator Tree:
>   |
> |   TableScan 
>   |
> | alias: t2   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE |
> | Select Operator 
>   |
> |   expressions: id (type: int)   
>   |
> |   outputColumnNames: _col0  
>   |
> |   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column 
> stats: NONE   |
> |   File Output Operator  
>   |
> | compressed: false   
>   |
> | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE   

[jira] [Commented] (HIVE-15898) add Type2 SCD merge tests

2017-07-11 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082387#comment-16082387
 ] 

Hive QA commented on HIVE-15898:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12876624/HIVE-15898.07.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10839 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[zero_rows_blobstore]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_type2_scd]
 (batchId=144)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=177)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=177)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5956/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5956/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5956/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12876624 - PreCommit-HIVE-Build

> add Type2 SCD merge tests
> -
>
> Key: HIVE-15898
> URL: https://issues.apache.org/jira/browse/HIVE-15898
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15898.01.patch, HIVE-15898.02.patch, 
> HIVE-15898.03.patch, HIVE-15898.04.patch, HIVE-15898.05.patch, 
> HIVE-15898.06.patch, HIVE-15898.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly

2017-07-11 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082334#comment-16082334
 ] 

Peter Vary commented on HIVE-16357:
---

Hi [~zsombor.klara],

Thanks for the patch!
One nit: boolean successs - too much 's' :)

2 more interesting question:
- Do we need to send out a notification about an unsuccessful event with empty 
list of tables?
- The same change might be applied to the other events as well...

[~zsombor.klara], [~mohitsabharwal]: What do you think?

Thanks,
Peter


> Failed folder creation when creating a new table is reported incorrectly
> 
>
> Key: HIVE-16357
> URL: https://issues.apache.org/jira/browse/HIVE-16357
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16357.01.patch
>
>
> If the directory for a Hive table could not be created, them the HMS will 
> throw a metaexception:
> {code}
>  if (tblPath != null) {
>   if (!wh.isDir(tblPath)) {
> if (!wh.mkdirs(tblPath, true)) {
>   throw new MetaException(tblPath
>   + " is not a directory or unable to create one");
> }
> madeDir = true;
>   }
> }
> {code}
> However in the finally block we always try to call the 
> DbNotificationListener, which in turn will also throw an exception because 
> the directory is missing, overwriting the initial exception with a 
> FileNotFoundException.
> Actual stacktrace seen by the caller:
> {code}
> 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: 
> MetaException(message:java.lang.RuntimeException: 
> java.io.FileNotFoundException: File file:/.../0 does not exist)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File 
> file:/.../0 does not exist
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482)
>   ... 20 more
> Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
>   

[jira] [Updated] (HIVE-15767) Hive On Spark is not working on secure clusters from Oozie

2017-07-11 Thread Peter Cseh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Cseh updated HIVE-15767:
--
Attachment: HIVE-15767-002.patch

Addressing typo

> Hive On Spark is not working on secure clusters from Oozie
> --
>
> Key: HIVE-15767
> URL: https://issues.apache.org/jira/browse/HIVE-15767
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.2.1, 2.1.1
>Reporter: Peter Cseh
>Assignee: Peter Cseh
> Attachments: HIVE-15767-001.patch, HIVE-15767-002.patch
>
>
> When a HiveAction is launched form Oozie with Hive On Spark enabled, we're 
> getting errors:
> {noformat}
> Caused by: java.io.IOException: Exception reading 
> file:/yarn/nm/usercache/yshi/appcache/application_1485271416004_0022/container_1485271416004_0022_01_02/container_tokens
> at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:188)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:155)
> {noformat}
> This is caused by passing the {{mapreduce.job.credentials.binary}} property 
> to the Spark configuration in RemoteHiveSparkClient.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >