[jira] [Commented] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation

2018-02-01 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349915#comment-16349915
 ] 

Gopal V commented on HIVE-18611:


LGTM - +1 tests pending.

> Avoid memory allocation of aggregation buffer during stats computation 
> ---
>
> Key: HIVE-18611
> URL: https://issues.apache.org/jira/browse/HIVE-18611
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
>  Labels: performance
> Attachments: HIVE-18611.patch
>
>
> Bloom filter aggregation buffer may result in allocation of upto ~594MB array 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18546) Remove unnecessary code introduced in HIVE-14498

2018-02-01 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349914#comment-16349914
 ] 

Ashutosh Chauhan commented on HIVE-18546:
-

+1

> Remove unnecessary code introduced in HIVE-14498
> 
>
> Key: HIVE-18546
> URL: https://issues.apache.org/jira/browse/HIVE-18546
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18546.02.patch, HIVE-18546.03.patch, 
> HIVE-18546.04.patch
>
>
> HIVE-14498 introduced some code to check the invalidation of materialized 
> views that can be simplified, relying instead on existing transaction ids.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation

2018-02-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18611:
---
Affects Version/s: 3.0.0
   2.3.2

> Avoid memory allocation of aggregation buffer during stats computation 
> ---
>
> Key: HIVE-18611
> URL: https://issues.apache.org/jira/browse/HIVE-18611
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
>  Labels: performance
> Attachments: HIVE-18611.patch
>
>
> Bloom filter aggregation buffer may result in allocation of upto ~594MB array 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation

2018-02-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18611:
---
Labels: performance  (was: )

> Avoid memory allocation of aggregation buffer during stats computation 
> ---
>
> Key: HIVE-18611
> URL: https://issues.apache.org/jira/browse/HIVE-18611
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
>  Labels: performance
> Attachments: HIVE-18611.patch
>
>
> Bloom filter aggregation buffer may result in allocation of upto ~594MB array 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation

2018-02-01 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-18611:

Attachment: HIVE-18611.patch

> Avoid memory allocation of aggregation buffer during stats computation 
> ---
>
> Key: HIVE-18611
> URL: https://issues.apache.org/jira/browse/HIVE-18611
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
> Attachments: HIVE-18611.patch
>
>
> Bloom filter aggregation buffer may result in allocation of upto ~594MB array 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation

2018-02-01 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-18611:

Status: Patch Available  (was: Open)

> Avoid memory allocation of aggregation buffer during stats computation 
> ---
>
> Key: HIVE-18611
> URL: https://issues.apache.org/jira/browse/HIVE-18611
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
> Attachments: HIVE-18611.patch
>
>
> Bloom filter aggregation buffer may result in allocation of upto ~594MB array 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18611) Avoid memory allocation of aggregation buffer during stats computation

2018-02-01 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-18611:
---


> Avoid memory allocation of aggregation buffer during stats computation 
> ---
>
> Key: HIVE-18611
> URL: https://issues.apache.org/jira/browse/HIVE-18611
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer, Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>Priority: Major
>
> Bloom filter aggregation buffer may result in allocation of upto ~594MB array 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18599) CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write data

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349888#comment-16349888
 ] 

Hive QA commented on HIVE-18599:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
43s{color} | {color:red} ql: The patch generated 7 new + 632 unchanged - 1 
fixed = 639 total (was 633) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8983/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8983/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write 
> data
> -
>
> Key: HIVE-18599
> URL: https://issues.apache.org/jira/browse/HIVE-18599
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Attachments: HIVE-18599.01.patch, HIVE-18599.02.patch, 
> HIVE-18599.03.patch
>
>
> CTTAS on temporary micromanaged table does not write data. 
> I.e., "SELECT * FROM ctas0_mm;" does not return any rows from the below 
> script:
>  
> set hive.mapred.mode=nonstrict;
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set tez.grouping.min-size=1;
> set tez.grouping.max-size=2;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
>  
> drop table intermediate;
> create table intermediate(key int) partitioned by (p int) stored as orc;
> insert into table intermediate partition(p='455') select distinct key from 
> src where key >= 0 order by key desc limit 2;
> insert into table intermediate partition(p='456') select distinct key from 
> src where key is not null order by key asc limit 2;
> insert into table intermediate partition(p='457') select distinct key from 
> src where key >= 100 order by key asc limit 2;
>   
> drop table ctas0_mm; 
> explain create temporary table ctas0_mm tblproperties 
> ("transactional"="true", "transactional_properties"="insert_only") as select 
> * from intermediate;
> cre

[jira] [Updated] (HIVE-18610) Performance: ListKeyWrapper does not check for hashcode equals, before comparing members

2018-02-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18610:
---
Attachment: HIVE-18610.1.patch

> Performance: ListKeyWrapper does not check for hashcode equals, before 
> comparing members
> 
>
> Key: HIVE-18610
> URL: https://issues.apache.org/jira/browse/HIVE-18610
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-18610.1.patch
>
>
> ListKeyWrapper::equals() 
> {code}
> @Override
> public boolean equals(Object obj) {
>   if (!(obj instanceof ListKeyWrapper)) {
> return false;
>   }
>   Object[] copied_in_hashmap = ((ListKeyWrapper) obj).keys;
>   return equalComparer.areEqual(copied_in_hashmap, keys);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18610) Performance: ListKeyWrapper does not check for hashcode equals, before comparing members

2018-02-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18610:
---
Status: Patch Available  (was: Open)

> Performance: ListKeyWrapper does not check for hashcode equals, before 
> comparing members
> 
>
> Key: HIVE-18610
> URL: https://issues.apache.org/jira/browse/HIVE-18610
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-18610.1.patch
>
>
> ListKeyWrapper::equals() 
> {code}
> @Override
> public boolean equals(Object obj) {
>   if (!(obj instanceof ListKeyWrapper)) {
> return false;
>   }
>   Object[] copied_in_hashmap = ((ListKeyWrapper) obj).keys;
>   return equalComparer.areEqual(copied_in_hashmap, keys);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18610) Performance: ListKeyWrapper does not check for hashcode equals, before comparing members

2018-02-01 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-18610:
--

Assignee: Gopal V

> Performance: ListKeyWrapper does not check for hashcode equals, before 
> comparing members
> 
>
> Key: HIVE-18610
> URL: https://issues.apache.org/jira/browse/HIVE-18610
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
>
> ListKeyWrapper::equals() 
> {code}
> @Override
> public boolean equals(Object obj) {
>   if (!(obj instanceof ListKeyWrapper)) {
> return false;
>   }
>   Object[] copied_in_hashmap = ((ListKeyWrapper) obj).keys;
>   return equalComparer.areEqual(copied_in_hashmap, keys);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names

2018-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349878#comment-16349878
 ] 

ASF GitHub Bot commented on HIVE-18581:
---

Github user anishek closed the pull request at:

https://github.com/apache/hive/pull/304


> Replication events should use lower case db object names
> 
>
> Key: HIVE-18581
> URL: https://issues.apache.org/jira/browse/HIVE-18581
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch
>
>
> events generated by replication should include the database /  tables /  
> partitions / function names in lower case. this will prevent other 
> applications to explicitly do case insensitive match of objects using names. 
> in hive all db object names as specified above are explicitly converted to 
> lower case when comparing between objects of same types. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349875#comment-16349875
 ] 

Hive QA commented on HIVE-18606:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908896/HIVE-18606.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 26 failed/errored test(s), 12967 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=163)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.metastore.TestMarkPartition.testMarkingPartitionSet 
(batchId=215)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNullStorageDescriptorInNew[Embedded]
 (batchId=206)
org.apache.hadoop.hive.metastore.client.TestTablesList.testListTableNamesByFilterNullDatabase[Embedded]
 (batchId=206)
org.apache.hadoop.hive.ql.TestTxnNoBuckets.testCtasEmpty (batchId=280)
org.apache.hadoop.hive.ql.TestTxnNoBucketsVectorized.testCtasEmpty (batchId=280)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8982/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8982/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8982/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 26 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12908896 - PreCommit-HIVE-Build

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up 

[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2018-02-01 Thread Ke Jia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349863#comment-16349863
 ] 

Ke Jia commented on HIVE-17139:
---

[~mmccline], [~Ferd],[~colinma], I update the patch to fix HIVE-18524 issue and 
upload to RB . Please help me review. Thanks for your help. 

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-17139.1.patch, HIVE-17139.10.patch, 
> HIVE-17139.11.patch, HIVE-17139.12.patch, HIVE-17139.13.patch, 
> HIVE-17139.13.patch, HIVE-17139.14.patch, HIVE-17139.15.patch, 
> HIVE-17139.16.patch, HIVE-17139.17.patch, HIVE-17139.18.patch, 
> HIVE-17139.18.patch, HIVE-17139.19.patch, HIVE-17139.2.patch, 
> HIVE-17139.20.patch, HIVE-17139.3.patch, HIVE-17139.4.patch, 
> HIVE-17139.5.patch, HIVE-17139.6.patch, HIVE-17139.7.patch, 
> HIVE-17139.8.patch, HIVE-17139.9.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18442) HoS: No FileSystem for scheme: nullscan

2018-02-01 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-18442:
--
Attachment: HIVE-18442.2.patch

> HoS: No FileSystem for scheme: nullscan
> ---
>
> Key: HIVE-18442
> URL: https://issues.apache.org/jira/browse/HIVE-18442
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
> Attachments: HIVE-18442.1.patch, HIVE-18442.2.patch
>
>
> Hit the issue when I run following query in yarn-cluster mode:
> {code}
> select * from (select key from src where false) a left outer join (select key 
> from srcpart limit 0) b on a.key=b.key;
> {code}
> Stack trace:
> {noformat}
> Job failed with java.io.IOException: No FileSystem for scheme: nullscan
>   at 
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799)
>   at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810)
>   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
>   at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2605)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.isEmptyPath(Utilities.java:2601)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities$GetInputPathsCallable.call(Utilities.java:3409)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:3347)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.cloneJobConf(SparkPlanGenerator.java:299)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:222)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:109)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:354)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:358)
>   at 
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18581) Replication events should use lower case db object names

2018-02-01 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-18581:
---
Status: Open  (was: Patch Available)

> Replication events should use lower case db object names
> 
>
> Key: HIVE-18581
> URL: https://issues.apache.org/jira/browse/HIVE-18581
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch
>
>
> events generated by replication should include the database /  tables /  
> partitions / function names in lower case. this will prevent other 
> applications to explicitly do case insensitive match of objects using names. 
> in hive all db object names as specified above are explicitly converted to 
> lower case when comparing between objects of same types. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-18581) Replication events should use lower case db object names

2018-02-01 Thread anishek (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek resolved HIVE-18581.

Resolution: Won't Fix

> Replication events should use lower case db object names
> 
>
> Key: HIVE-18581
> URL: https://issues.apache.org/jira/browse/HIVE-18581
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch
>
>
> events generated by replication should include the database /  tables /  
> partitions / function names in lower case. this will prevent other 
> applications to explicitly do case insensitive match of objects using names. 
> in hive all db object names as specified above are explicitly converted to 
> lower case when comparing between objects of same types. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names

2018-02-01 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349844#comment-16349844
 ] 

anishek commented on HIVE-18581:


the replication sub system is case insensitive and hence does not require these 
changes, this was primarily for external systems using repl events, however 
there is no requirement /  dependency there as of now, so pausing this effort. 

> Replication events should use lower case db object names
> 
>
> Key: HIVE-18581
> URL: https://issues.apache.org/jira/browse/HIVE-18581
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch
>
>
> events generated by replication should include the database /  tables /  
> partitions / function names in lower case. this will prevent other 
> applications to explicitly do case insensitive match of objects using names. 
> in hive all db object names as specified above are explicitly converted to 
> lower case when comparing between objects of same types. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349825#comment-16349825
 ] 

Hive QA commented on HIVE-18606:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 1 new + 322 unchanged - 0 
fixed = 323 total (was 322) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8982/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8982/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.h

[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349813#comment-16349813
 ] 

Hive QA commented on HIVE-18410:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908852/HIVE-18410_2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 12967 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_gby_empty] 
(batchId=82)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=163)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8981/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8981/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8981/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12908852 - PreCommit-HIVE-Build

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 3.0.0, 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18573) Use proper Calcite operator instead of UDFs

2018-02-01 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-18573:
--
Attachment: HIVE-18573.6.patch

> Use proper Calcite operator instead of UDFs
> ---
>
> Key: HIVE-18573
> URL: https://issues.apache.org/jira/browse/HIVE-18573
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: slim bouguerra
>Priority: Major
> Attachments: HIVE-18573.2.patch, HIVE-18573.3.patch, 
> HIVE-18573.4.patch, HIVE-18573.5.patch, HIVE-18573.6.patch, HIVE-18573.patch
>
>
> Currently, Hive is mostly using user-defined black box sql operators during 
> Query planning. It will be more beneficial to use proper calcite operators.
> Also, Use a single name for Extract operator instead of a different name for 
> every Unit,  
> Same for Floor function. This will allow unifying the treatment per operator.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349791#comment-16349791
 ] 

Hive QA commented on HIVE-18410:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
35s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| modules | C: . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8981/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 3.0.0, 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18573) Use proper Calcite operator instead of UDFs

2018-02-01 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-18573:
--
Attachment: HIVE-18573.5.patch

> Use proper Calcite operator instead of UDFs
> ---
>
> Key: HIVE-18573
> URL: https://issues.apache.org/jira/browse/HIVE-18573
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: slim bouguerra
>Priority: Major
> Attachments: HIVE-18573.2.patch, HIVE-18573.3.patch, 
> HIVE-18573.4.patch, HIVE-18573.5.patch, HIVE-18573.patch
>
>
> Currently, Hive is mostly using user-defined black box sql operators during 
> Query planning. It will be more beneficial to use proper calcite operators.
> Also, Use a single name for Extract operator instead of a different name for 
> every Unit,  
> Same for Floor function. This will allow unifying the treatment per operator.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2018-02-01 Thread Ke Jia (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia reopened HIVE-17139:
---

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-17139.1.patch, HIVE-17139.10.patch, 
> HIVE-17139.11.patch, HIVE-17139.12.patch, HIVE-17139.13.patch, 
> HIVE-17139.13.patch, HIVE-17139.14.patch, HIVE-17139.15.patch, 
> HIVE-17139.16.patch, HIVE-17139.17.patch, HIVE-17139.18.patch, 
> HIVE-17139.18.patch, HIVE-17139.19.patch, HIVE-17139.2.patch, 
> HIVE-17139.20.patch, HIVE-17139.3.patch, HIVE-17139.4.patch, 
> HIVE-17139.5.patch, HIVE-17139.6.patch, HIVE-17139.7.patch, 
> HIVE-17139.8.patch, HIVE-17139.9.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349763#comment-16349763
 ] 

Hive QA commented on HIVE-17935:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908851/HIVE-17935.8.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 92 failed/errored test(s), 12965 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_part] 
(batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_1] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_2] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_6] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_partitioned] 
(batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[extrapolate_part_stats_partial]
 (batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[implicit_cast_during_insert]
 (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_into6] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part10] 
(batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part14] 
(batchId=90)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part1] 
(batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part3] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part4] 
(batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part8] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part9] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge3] (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge4] (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition3]
 (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition4]
 (batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[merge_dynamic_partition5]
 (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_int_type_promotion] 
(batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge2] (batchId=91)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats2] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats4] (batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats_empty_dyn_part] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_15] 
(batchId=87)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_16] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_17] 
(batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_18] 
(batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_remove_25] 
(batchId=89)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[update_all_partitioned] 
(batchId=52)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_stats] 
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_merge2] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[acid_no_buckets]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[delete_all_partitioned]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dp_counter_mm]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dp_counter_non_mm]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[extrapolate_part_stats_partial_ndv]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[lineage3] 
(batchId=163)
org.apa

[jira] [Updated] (HIVE-18595) UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone

2018-02-01 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-18595:
--
Status: Patch Available  (was: Open)

> UNIX_TIMESTAMP  UDF fails when type is Timestamp with local timezone
> 
>
> Key: HIVE-18595
> URL: https://issues.apache.org/jira/browse/HIVE-18595
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Priority: Major
> Attachments: HIVE-18595.patch
>
>
> {code}
> 2018-01-31T12:59:45,464 ERROR [10e97c86-7f90-406b-a8fa-38be5d3529cc main] 
> ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:456 Wrong 
> arguments ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only 
> string/date/timestamp types
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:456 Wrong arguments 
> ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only 
> string/date/timestamp types
>  at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394)
>  at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  at 
> org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
>  at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>  at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235)
>  at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11780)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBLogicalPlan(CalcitePlanner.java:3140)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4330)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354)
>  at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>  at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>  at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>  at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
>  at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639)
>  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382)
>  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343)
>  at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331)
>  at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305)
>  at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
>  at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>  at 
> org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho

[jira] [Updated] (HIVE-18595) UNIX_TIMESTAMP UDF fails when type is Timestamp with local timezone

2018-02-01 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-18595:
--
Attachment: HIVE-18595.patch

> UNIX_TIMESTAMP  UDF fails when type is Timestamp with local timezone
> 
>
> Key: HIVE-18595
> URL: https://issues.apache.org/jira/browse/HIVE-18595
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Priority: Major
> Attachments: HIVE-18595.patch
>
>
> {code}
> 2018-01-31T12:59:45,464 ERROR [10e97c86-7f90-406b-a8fa-38be5d3529cc main] 
> ql.Driver: FAILED: SemanticException [Error 10014]: Line 3:456 Wrong 
> arguments ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only 
> string/date/timestamp types
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:456 Wrong arguments 
> ''-MM-dd HH:mm:ss'': The function UNIX_TIMESTAMP takes only 
> string/date/timestamp types
>  at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1394)
>  at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  at 
> org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76)
>  at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>  at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:235)
>  at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:181)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11847)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11780)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genGBLogicalPlan(CalcitePlanner.java:3140)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:4330)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1407)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1354)
>  at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:118)
>  at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:1052)
>  at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:154)
>  at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:111)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1159)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1175)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:422)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11393)
>  at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:304)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
>  at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:163)
>  at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:268)
>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:639)
>  at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1504)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1632)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1382)
>  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:240)
>  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
>  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:343)
>  at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1331)
>  at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1305)
>  at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
>  at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>  at 
> org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:59)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso

[jira] [Updated] (HIVE-18573) Use proper Calcite operator instead of UDFs

2018-02-01 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-18573:
--
Attachment: HIVE-18573.4.patch

> Use proper Calcite operator instead of UDFs
> ---
>
> Key: HIVE-18573
> URL: https://issues.apache.org/jira/browse/HIVE-18573
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: slim bouguerra
>Priority: Major
> Attachments: HIVE-18573.2.patch, HIVE-18573.3.patch, 
> HIVE-18573.4.patch, HIVE-18573.patch
>
>
> Currently, Hive is mostly using user-defined black box sql operators during 
> Query planning. It will be more beneficial to use proper calcite operators.
> Also, Use a single name for Extract operator instead of a different name for 
> every Unit,  
> Same for Floor function. This will allow unifying the treatment per operator.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349723#comment-16349723
 ] 

Hive QA commented on HIVE-17935:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
31s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| modules | C: common ql itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8980/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Turn on hive.optimize.sort.dynamic.partition by default
> ---
>
> Key: HIVE-17935
> URL: https://issues.apache.org/jira/browse/HIVE-17935
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch, 
> HIVE-17935.3.patch, HIVE-17935.4.patch, HIVE-17935.5.patch, 
> HIVE-17935.6.patch, HIVE-17935.7.patch, HIVE-17935.8.patch
>
>
> The config option hive.optimize.sort.dynamic.partition is an optimization for 
> Hive’s dynamic partitioning feature. It was originally implemented in 
> [HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this 
> optimization, the dynamic partition columns and bucketing columns (in case of 
> bucketed tables) are sorted before being fed to the reducers. Since the 
> partitioning and bucketing columns are sorted, each reducer can keep only one 
> record writer open at any time thereby reducing the memory pressure on the 
> reducers. There were some early problems with this optimization and it was 
> disabled by default in HiveConf in 
> [HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then 
> setting hive.optimize.sort.dynamic.partition=true has been used to solve 
> problems where dynamic partitioning produces with (1) too many small files on 
> HDFS, which is bad for the cluster and can increase overhead for future Hive 
> queries over those partitions, and (2) O

[jira] [Commented] (HIVE-18516) load data should rename files consistent with insert statements for ACID Tables

2018-02-01 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349717#comment-16349717
 ] 

Deepak Jaiswal commented on HIVE-18516:
---

The test failures are unrelated.

> load data should rename files consistent with insert statements for ACID 
> Tables
> ---
>
> Key: HIVE-18516
> URL: https://issues.apache.org/jira/browse/HIVE-18516
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18516.1.patch, HIVE-18516.10.patch, 
> HIVE-18516.2.patch, HIVE-18516.3.patch, HIVE-18516.4.patch, 
> HIVE-18516.5.patch, HIVE-18516.6.patch, HIVE-18516.7.patch, 
> HIVE-18516.8.patch, HIVE-18516.9.patch
>
>
> h1. load data should rename files consistent with insert statements for ACID 
> Tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18513) Query results caching

2018-02-01 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349716#comment-16349716
 ] 

Xuefu Zhang edited comment on HIVE-18513 at 2/2/18 3:16 AM:


Thanks, [~jcamachorodriguez] and [~jdere]. I went thru the doc and is still 
unclear what mechanism was decided to invalidate cache. Is this doc still work 
in progress?


was (Author: xuefuz):
Thanks, [~jcamachorodriguez] and [~jdere]. I went thru the doc and is clear 
what mechanism was decided to invalidate cache. Is this doc still work in 
progress?

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18513) Query results caching

2018-02-01 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349716#comment-16349716
 ] 

Xuefu Zhang commented on HIVE-18513:


Thanks, [~jcamachorodriguez] and [~jdere]. I went thru the doc and is clear 
what mechanism was decided to invalidate cache. Is this doc still work in 
progress?

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18516) load data should rename files consistent with insert statements for ACID Tables

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349707#comment-16349707
 ] 

Hive QA commented on HIVE-18516:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908848/HIVE-18516.10.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 12965 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_table_update_status_disable_bitvector]
 (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=163)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.metastore.client.TestGetTableMeta.testGetTableMetaNullOrEmptyDb[Embedded]
 (batchId=206)
org.apache.hadoop.hive.metastore.client.TestTablesGetExists.testGetAllTablesCaseInsensitive[Embedded]
 (batchId=206)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8979/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8979/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8979/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 23 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12908848 - PreCommit-HIVE-Build

> load data should rename files consistent with insert statements for ACID 
> Tables
> ---
>
> Key: HIVE-18516
> URL: https://issues.apache.org/jira/browse/HIVE-18516
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18516.1.patch, HIVE-18516.10.patch, 
> HIVE-18516.2.patch, HIVE-18516.3.patch, HIVE-18516.4.patch, 
> HIVE-18516.5.patch, HIVE-18516.6.patch, HIVE-18516.7.patch, 
> HIVE-18516.8.patch, HIVE-18516.9.patch
>
>
> h1. load data should rename files consistent with insert statements for ACID 
> Tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18513) Query results caching

2018-02-01 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349704#comment-16349704
 ] 

Jesus Camacho Rodriguez commented on HIVE-18513:


[~xuefuz], [~jdere] already posted the link to the doc above:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75963441

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18513) Query results caching

2018-02-01 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349703#comment-16349703
 ] 

Xuefu Zhang commented on HIVE-18513:


Could we have the high-level doc linked here so others interested can do a 
high-level review at least? Thanks.

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18513) Query results caching

2018-02-01 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349685#comment-16349685
 ] 

Jesus Camacho Rodriguez commented on HIVE-18513:


+1 (pending tests)

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18513) Query results caching

2018-02-01 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-18513:
--
Attachment: HIVE-18513.5.patch

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18513) Query results caching

2018-02-01 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-18513:
--
Attachment: (was: HIVE-18513.5.patch)

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18513) Query results caching

2018-02-01 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349677#comment-16349677
 ] 

Jason Dere commented on HIVE-18513:
---

patch didn't include the additional changes, re-attaching 

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18513) Query results caching

2018-02-01 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-18513:
--
Attachment: HIVE-18513.5.patch

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18513) Query results caching

2018-02-01 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-18513:
--
Attachment: (was: HIVE-18513.5.patch)

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18516) load data should rename files consistent with insert statements for ACID Tables

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349665#comment-16349665
 ] 

Hive QA commented on HIVE-18516:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 10 new + 340 unchanged - 14 
fixed = 350 total (was 354) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8979/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8979/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> load data should rename files consistent with insert statements for ACID 
> Tables
> ---
>
> Key: HIVE-18516
> URL: https://issues.apache.org/jira/browse/HIVE-18516
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18516.1.patch, HIVE-18516.10.patch, 
> HIVE-18516.2.patch, HIVE-18516.3.patch, HIVE-18516.4.patch, 
> HIVE-18516.5.patch, HIVE-18516.6.patch, HIVE-18516.7.patch, 
> HIVE-18516.8.patch, HIVE-18516.9.patch
>
>
> h1. load data should rename files consistent with insert statements for ACID 
> Tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18608:

Status: Patch Available  (was: Open)

(Submitting for tests.)

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349660#comment-16349660
 ] 

Mithun Radhakrishnan edited comment on HIVE-18608 at 2/2/18 2:03 AM:
-

I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}.

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.


was (Author: mithun):
I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}.

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth. E.g. 
{{myInfoArray.__elem__.emailBody}}.

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349660#comment-16349660
 ] 

Mithun Radhakrishnan edited comment on HIVE-18608 at 2/2/18 2:03 AM:
-

I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}).

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.


was (Author: mithun):
I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}.

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349660#comment-16349660
 ] 

Mithun Radhakrishnan edited comment on HIVE-18608 at 2/2/18 2:03 AM:
-

I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}).

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the {{STRUCT}}, recursively).

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.


was (Author: mithun):
I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}).

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth, at a later date.
E.g. {{myInfoArray.__elem__.emailBody}}.

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349660#comment-16349660
 ] 

Mithun Radhakrishnan commented on HIVE-18608:
-

I've attached an initial implementation, where dictionary encoding might be 
disabled via a table-property ({{'orc.skip.dictionary.for.columns'}}.

Note: I've only added support for top-level columns. Specifying this on a 
{{STRUCT}} will disable dictionary encoding for the entire sub-tree (i.e. all 
members of the STRUCT), recursively.

It might be good to support selection at an arbitrary depth. E.g. 
{{myInfoArray.__elem__.emailBody}}.

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18608:

Attachment: (was: HIVE-18608.1-branch-2.2.patch)

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18608:

Attachment: HIVE-18608.1-branch-2.2.patch

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-18608:

Attachment: HIVE-18608.1-branch-2.2.patch

> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
> Attachments: HIVE-18608.1-branch-2.2.patch
>
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18608) ORC should allow selectively disabling dictionary-encoding on specified columns

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-18608:
---


> ORC should allow selectively disabling dictionary-encoding on specified 
> columns
> ---
>
> Key: HIVE-18608
> URL: https://issues.apache.org/jira/browse/HIVE-18608
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Affects Versions: 3.0.0, 2.4.0, 2.2.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
>
> Just as ORC allows the choice of columns to enable bloom-filters on, it would 
> be nice to have a way to specify which columns {{DICTIONARY_V2}} encoding 
> should be disabled on.
> Currently, the choice of dictionary-encoding depends on the results of 
> sampling the first row-stride within a stripe. If the user knows that a 
> column's cardinality is bound to prevent an effective dictionary, she might 
> choose to simply disable it on just that column, and avoid the cost of 
> sampling in the first row-stride.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18513) Query results caching

2018-02-01 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-18513:
--
Attachment: HIVE-18513.5.patch

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18513) Query results caching

2018-02-01 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349642#comment-16349642
 ] 

Jason Dere commented on HIVE-18513:
---

Additional changes per review from [~jcamachorodriguez].

> Query results caching
> -
>
> Key: HIVE-18513
> URL: https://issues.apache.org/jira/browse/HIVE-18513
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18513.1.patch, HIVE-18513.2.patch, 
> HIVE-18513.3.patch, HIVE-18513.4.patch, HIVE-18513.5.patch
>
>
> Add a query results cache that can save the results of an executed Hive query 
> for reuse on subsequent queries. This may be useful in cases where the same 
> query is issued many times, since Hive can return back the results of a 
> cached query rather than having to execute the full query on the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18575) ACID properties usage in jobconf is ambiguous for MM tables

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349639#comment-16349639
 ] 

Sergey Shelukhin commented on HIVE-18575:
-

[~ekoifman] can you take a look  at this one? :)

> ACID properties usage in jobconf is ambiguous for MM tables
> ---
>
> Key: HIVE-18575
> URL: https://issues.apache.org/jira/browse/HIVE-18575
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18575.patch
>
>
> Vectorization checks for ACID table trigger for MM tables where they don't 
> apply. Other places seem to set the setting for transactional case while most 
> of the code seems to assume it implies full acid.
> Overall, many places in the code use the settings directly or set the ACID 
> flag without setting the ACID properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349637#comment-16349637
 ] 

Sergey Shelukhin commented on HIVE-18606:
-

nit: else should be on the same line according to Hive coding standard :)
+1

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18588) Add 'checkin' profile that runs slower tests in standalone-metastore

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349635#comment-16349635
 ] 

Hive QA commented on HIVE-18588:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908876/HIVE-18588.2.patch

{color:green}SUCCESS:{color} +1 due to 70 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 51 failed/errored test(s), 11450 tests 
executed
*Failed tests:*
{noformat}
TestAddAlterDropIndexes - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestAddPartitions - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestAddPartitionsFromPartSpec - did not produce a TEST-*.xml file (likely timed 
out) (batchId=206)
TestAlterPartitions - did not produce a TEST-*.xml file (likely timed out) 
(batchId=213)
TestAppendPartitions - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestCachedStore - did not produce a TEST-*.xml file (likely timed out) 
(batchId=213)
TestDatabases - did not produce a TEST-*.xml file (likely timed out) 
(batchId=213)
TestDropPartitions - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestFunctions - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestGetListIndexes - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestGetPartitions - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestGetTableMeta - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestHiveMetaStorePartitionSpecs - did not produce a TEST-*.xml file (likely 
timed out) (batchId=213)
TestHiveMetaStoreTimeout - did not produce a TEST-*.xml file (likely timed out) 
(batchId=215)
TestListPartitions - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestMarkPartition - did not produce a TEST-*.xml file (likely timed out) 
(batchId=215)
TestMarkPartitionRemote - did not produce a TEST-*.xml file (likely timed out) 
(batchId=215)
TestMetaStoreInitListener - did not produce a TEST-*.xml file (likely timed 
out) (batchId=215)
TestObjectStoreInitRetry - did not produce a TEST-*.xml file (likely timed out) 
(batchId=213)
TestRawStoreProxy - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestRemoteHiveMetaStore - did not produce a TEST-*.xml file (likely timed out) 
(batchId=209)
TestRemoteHiveMetaStoreIpAddress - did not produce a TEST-*.xml file (likely 
timed out) (batchId=204)
TestRemoteUGIHiveMetaStoreIpAddress - did not produce a TEST-*.xml file (likely 
timed out) (batchId=212)
TestRetriesInRetryingHMSHandler - did not produce a TEST-*.xml file (likely 
timed out) (batchId=215)
TestRetryingHMSHandler - did not produce a TEST-*.xml file (likely timed out) 
(batchId=213)
TestSetUGIOnBothClientServer - did not produce a TEST-*.xml file (likely timed 
out) (batchId=207)
TestSetUGIOnOnlyClient - did not produce a TEST-*.xml file (likely timed out) 
(batchId=205)
TestSetUGIOnOnlyServer - did not produce a TEST-*.xml file (likely timed out) 
(batchId=214)
TestTablesCreateDropAlterTruncate - did not produce a TEST-*.xml file (likely 
timed out) (batchId=206)
TestTablesGetExists - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
TestTablesList - did not produce a TEST-*.xml file (likely timed out) 
(batchId=206)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_bmj_schema_evolution]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=163)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hi

[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349633#comment-16349633
 ] 

Eugene Koifman commented on HIVE-18606:
---

yes, you are right.  fixed in patch2

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
Attachment: HIVE-18606.02.patch

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch, HIVE-18606.02.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349629#comment-16349629
 ] 

Sergey Shelukhin commented on HIVE-18606:
-

{noformat}
+if(srcs != null) {
+  LOG.debug("No files found to move from " + sourcePath + " to " + 
targetPath);
{noformat}
Seems like this debug statement should be in the else. Looks good otherwise

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349627#comment-16349627
 ] 

Eugene Koifman commented on HIVE-18606:
---

[~sershe] could you review please

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2018-02-01 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349622#comment-16349622
 ] 

Alexander Kolbasov commented on HIVE-16886:
---

[~anishek] I am fine with updating branch 3 as well. Do you want to do it 
yourself or you'd like me to do it? I think we need to do the following:
 * Make sure you are Ok with the suggested fix
 * Revert the original fix
 * Apply new fix.

So this will be two commits on branch-3 and one commit on branch-2.

Please4 let me know what you think about it.

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Affects Versions: 3.0.0, 2.3.2, 2.3.3
>Reporter: Sergio Peña
>Assignee: anishek
>Priority: Major
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16886.1.patch, HIVE-16886.2.patch, 
> HIVE-16886.3.patch, HIVE-16886.4.patch, HIVE-16886.5.patch, 
> HIVE-16886.6.patch, HIVE-16886.7.patch, HIVE-16886.8.patch, 
> datastore-identity-holes.diff
>
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18588) Add 'checkin' profile that runs slower tests in standalone-metastore

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349600#comment-16349600
 ] 

Hive QA commented on HIVE-18588:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
20s{color} | {color:red} standalone-metastore: The patch generated 36 new + 437 
unchanged - 32 fixed = 473 total (was 469) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  xml  compile  findbugs  
checkstyle  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8978/yetus/diff-checkstyle-standalone-metastore.txt
 |
| modules | C: standalone-metastore U: standalone-metastore |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8978/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Add 'checkin' profile that runs slower tests in standalone-metastore
> 
>
> Key: HIVE-18588
> URL: https://issues.apache.org/jira/browse/HIVE-18588
> Project: Hive
>  Issue Type: Test
>  Components: Standalone Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Attachments: HIVE-18588.2.patch, HIVE-18588.patch
>
>
> Runtime for unit tests in standalone-metastore are now exceeding 25 minutes.  
> Ideally unit tests should finish within 2-3 minutes so users will run them 
> frequently.  To solve this I propose to carve off many of the slower tests to 
> run in a new 'checkin' profile.  This profile should be run before checkin 
> and by the ptest infrastructure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.10.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: (was: HIVE-18350.10.patch)

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18600) Vectorization: Top-Level Vector Expression Scratch Column Deallocation

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349579#comment-16349579
 ] 

Hive QA commented on HIVE-18600:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908816/HIVE-18600.01.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 44 failed/errored test(s), 12965 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_10]
 (batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_11]
 (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_13]
 (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_14]
 (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_17]
 (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_2] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_3] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_7] 
(batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_8] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_vectorization_div0]
 (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_part_project]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=179)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_inner_join]
 (batchId=181)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join0]
 (batchId=181)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1]
 (batchId=180)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2]
 (batchId=179)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join3]
 (batchId=180)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join4]
 (batchId=182)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join5]
 (batchId=182)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_case] 
(batchId=132)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testArithmeticExpressionVectorization
 (batchId=283)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188)
org.apache.hive.hcatalog.common.TestHiveClientCache.testCacheExpiry 
(batchId=200)
org.apache.hive.hcatalog.common.TestHiveClientCache.testCacheHit (batchId=200)
org.apache.hive.hcatalog.common.TestHiveClientCache.testCacheMiss (batchId=200)
org.apache.hive.hcatalog.common.TestHiveClientCache.testCloseAllClients 
(batchId=200)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8977/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8977/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8977/

Messages:
{noformat}
Executing org.apache.hive.ptest.execut

[jira] [Updated] (HIVE-18607) HBase HFile write does strange things

2018-02-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18607:

Attachment: HIVE-18607.patch

> HBase HFile write does strange things
> -
>
> Key: HIVE-18607
> URL: https://issues.apache.org/jira/browse/HIVE-18607
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18607.patch
>
>
> I cannot get the HBaseCliDriver to run locally, so first I'll use HiveQA to 
> check smth.
> There's some strange code in the output handler that changes output directory 
> into a file because Hive supposedly wants that. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.10.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18607) HBase HFile write does strange things

2018-02-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18607:

Status: Patch Available  (was: Open)

> HBase HFile write does strange things
> -
>
> Key: HIVE-18607
> URL: https://issues.apache.org/jira/browse/HIVE-18607
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18607.patch
>
>
> I cannot get the HBaseCliDriver to run locally, so first I'll use HiveQA to 
> check smth.
> There's some strange code in the output handler that changes output directory 
> into a file because Hive supposedly wants that. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: (was: HIVE-18350.10.patch)

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18607) HBase HFile write does strange things

2018-02-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-18607:
---


> HBase HFile write does strange things
> -
>
> Key: HIVE-18607
> URL: https://issues.apache.org/jira/browse/HIVE-18607
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> I cannot get the HBaseCliDriver to run locally, so first I'll use HiveQA to 
> check smth.
> There's some strange code in the output handler that changes output directory 
> into a file because Hive supposedly wants that. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349540#comment-16349540
 ] 

Deepak Jaiswal commented on HIVE-18350:
---

Updated the patch based on Jason's comments.

Adding [~thejas] to review.

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.10.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17751) Separate HMS Client and HMS server into separate sub-modules

2018-02-01 Thread Alexander Kolbasov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349526#comment-16349526
 ] 

Alexander Kolbasov commented on HIVE-17751:
---

[~alangates] [~thejas] Would you be able to review the changes?

> Separate HMS Client and HMS server into separate sub-modules
> 
>
> Key: HIVE-17751
> URL: https://issues.apache.org/jira/browse/HIVE-17751
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Alexander Kolbasov
>Priority: Major
> Attachments: HIVE-17751.06-standalone-metastore.patch
>
>
> external applications which are interfacing with HMS should ideally only 
> include HMSClient library instead of one big library containing server as 
> well. We should ideally have a thin client library so that cross version 
> support for external applications is easier. We should sub-divide the 
> standalone module into possibly 3 modules (one for common classes, one for 
> client classes and one for server) or 2 sub-modules (one for client and one 
> for server) so that we can generate separate jars for HMS client and server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18600) Vectorization: Top-Level Vector Expression Scratch Column Deallocation

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349497#comment-16349497
 ] 

Hive QA commented on HIVE-18600:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
43s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
44s{color} | {color:red} ql: The patch generated 1 new + 878 unchanged - 1 
fixed = 879 total (was 879) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8977/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8977/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Vectorization: Top-Level Vector Expression Scratch Column Deallocation
> --
>
> Key: HIVE-18600
> URL: https://issues.apache.org/jira/browse/HIVE-18600
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-18600.01.patch
>
>
> The operators create various vector expression *arrays* for predicates, 
> SELECT clauses, key expressions, etc.  We could have those be marked as 
> special "top level" vector expression then we could defer deallocation until 
> the top level expression is complete.  This could be a simple solution that 
> avoids trying fix our current eager deallocation that tries to reuse scratch 
> columns as soon as possible.  It *isn't optimal*, but it *shouldn't be too 
> bad*. This solution is much better than not deallocating at all - especially 
> for queries that SELECT a large number of columns or have a lot of 
> expressions in the operator tree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18567) ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349473#comment-16349473
 ] 

Hive QA commented on HIVE-18567:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908804/HIVE-18567.0.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 23 failed/errored test(s), 12967 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=163)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.metastore.client.TestAddPartitions.testAddPartitionsNullColTypeInSd[Embedded]
 (batchId=206)
org.apache.hadoop.hive.metastore.client.TestDatabases.testGetAllDatabases[Embedded]
 (batchId=213)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188)
org.apache.hive.hcatalog.listener.TestDbNotificationListener.dropDatabase 
(batchId=242)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8976/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8976/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8976/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 23 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12908804 - PreCommit-HIVE-Build

> ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly
> 
>
> Key: HIVE-18567
> URL: https://issues.apache.org/jira/browse/HIVE-18567
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-18567.0.patch
>
>
> As per [this HMS API test 
> case|https://github.com/apache/hive/commit/fa0a8d27d4149cc5cc2dbb49d8eb6b03f46bc279#diff-25c67d898000b53e623a6df9221aad5dR1044]
>  listing partition names doesn't check tha max param against 
> MetaStoreConf.LIMIT_PARTITION_REQUEST (as other methods do by 
> checkLimitNumberOfPartitionsByFilter), and also behaves differently on max=0 
> setting compared to other methods.
> We should bring this into consistency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18596) Synchronize value of hive.spark.client.connect.timeout across unit tests

2018-02-01 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349442#comment-16349442
 ] 

Sahil Takiar commented on HIVE-18596:
-

[~pvary] could you take a look?

> Synchronize value of hive.spark.client.connect.timeout across unit tests
> 
>
> Key: HIVE-18596
> URL: https://issues.apache.org/jira/browse/HIVE-18596
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HIVE-18596.1.patch
>
>
> {{hive.spark.client.connect.timeout}} is set to 30 seconds for 
> {{TestMiniSparkOnYarnCliDriver}} but it left at the default value for all 
> other tests. We should use the same value (30 seconds) for all other tests.
> We have seen flaky tests due to failure to establish the remote within the 
> allotted timeout. This could be due to the fact that we run our tests in the 
> cloud so maybe occasional network delays are more common.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18599) CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write data

2018-02-01 Thread Steve Yeom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349390#comment-16349390
 ] 

Steve Yeom commented on HIVE-18599:
---

Hi [~gopalv], Can you take a look at patch 3? 

The test failures by patch 1 are related to null value, which is fixed at patch 
2. 

patch 3 has q file test.

> CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write 
> data
> -
>
> Key: HIVE-18599
> URL: https://issues.apache.org/jira/browse/HIVE-18599
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Attachments: HIVE-18599.01.patch, HIVE-18599.02.patch, 
> HIVE-18599.03.patch
>
>
> CTTAS on temporary micromanaged table does not write data. 
> I.e., "SELECT * FROM ctas0_mm;" does not return any rows from the below 
> script:
>  
> set hive.mapred.mode=nonstrict;
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set tez.grouping.min-size=1;
> set tez.grouping.max-size=2;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
>  
> drop table intermediate;
> create table intermediate(key int) partitioned by (p int) stored as orc;
> insert into table intermediate partition(p='455') select distinct key from 
> src where key >= 0 order by key desc limit 2;
> insert into table intermediate partition(p='456') select distinct key from 
> src where key is not null order by key asc limit 2;
> insert into table intermediate partition(p='457') select distinct key from 
> src where key >= 100 order by key asc limit 2;
>   
> drop table ctas0_mm; 
> explain create temporary table ctas0_mm tblproperties 
> ("transactional"="true", "transactional_properties"="insert_only") as select 
> * from intermediate;
> create temporary table ctas0_mm tblproperties ("transactional"="true", 
> "transactional_properties"="insert_only") as select * from intermediate;
>  
> select * from ctas0_mm;
> drop table ctas0_mm;
> drop table intermediate;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18599) CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write data

2018-02-01 Thread Steve Yeom (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-18599:
--
Attachment: HIVE-18599.03.patch

> CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write 
> data
> -
>
> Key: HIVE-18599
> URL: https://issues.apache.org/jira/browse/HIVE-18599
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Attachments: HIVE-18599.01.patch, HIVE-18599.02.patch, 
> HIVE-18599.03.patch
>
>
> CTTAS on temporary micromanaged table does not write data. 
> I.e., "SELECT * FROM ctas0_mm;" does not return any rows from the below 
> script:
>  
> set hive.mapred.mode=nonstrict;
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set tez.grouping.min-size=1;
> set tez.grouping.max-size=2;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
>  
> drop table intermediate;
> create table intermediate(key int) partitioned by (p int) stored as orc;
> insert into table intermediate partition(p='455') select distinct key from 
> src where key >= 0 order by key desc limit 2;
> insert into table intermediate partition(p='456') select distinct key from 
> src where key is not null order by key asc limit 2;
> insert into table intermediate partition(p='457') select distinct key from 
> src where key >= 100 order by key asc limit 2;
>   
> drop table ctas0_mm; 
> explain create temporary table ctas0_mm tblproperties 
> ("transactional"="true", "transactional_properties"="insert_only") as select 
> * from intermediate;
> create temporary table ctas0_mm tblproperties ("transactional"="true", 
> "transactional_properties"="insert_only") as select * from intermediate;
>  
> select * from ctas0_mm;
> drop table ctas0_mm;
> drop table intermediate;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2018-02-01 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349380#comment-16349380
 ] 

Aihua Xu commented on HIVE-14792:
-

Thanks [~mithun]  That's also what I guess. :) 

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2018-02-01 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349374#comment-16349374
 ] 

Mithun Radhakrishnan commented on HIVE-14792:
-

Terribly sorry for the delay, [~aihuaxu]. Yes, of course. The default should 
remain false. I'd changed it to true to see if the tests were affected by it.

This patch seems to cause test failures on trunk. I'll sort that out and post 
an update.

Thanks for reviewing, [~aihuaxu].

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18567) ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349365#comment-16349365
 ] 

Hive QA commented on HIVE-18567:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
23s{color} | {color:red} standalone-metastore: The patch generated 1 new + 761 
unchanged - 2 fixed = 762 total (was 763) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 55s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8976/yetus/diff-checkstyle-standalone-metastore.txt
 |
| modules | C: standalone-metastore U: standalone-metastore |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8976/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> ObjectStore.getPartitionNamesNoTxn doesn't handle max param properly
> 
>
> Key: HIVE-18567
> URL: https://issues.apache.org/jira/browse/HIVE-18567
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-18567.0.patch
>
>
> As per [this HMS API test 
> case|https://github.com/apache/hive/commit/fa0a8d27d4149cc5cc2dbb49d8eb6b03f46bc279#diff-25c67d898000b53e623a6df9221aad5dR1044]
>  listing partition names doesn't check tha max param against 
> MetaStoreConf.LIMIT_PARTITION_REQUEST (as other methods do by 
> checkLimitNumberOfPartitionsByFilter), and also behaves differently on max=0 
> setting compared to other methods.
> We should bring this into consistency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18588) Add 'checkin' profile that runs slower tests in standalone-metastore

2018-02-01 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349359#comment-16349359
 ] 

Alan Gates commented on HIVE-18588:
---

Attached a new version of the patch that removes the commented out profile in 
the pom and sets the surefire version to 2.20.1.

> Add 'checkin' profile that runs slower tests in standalone-metastore
> 
>
> Key: HIVE-18588
> URL: https://issues.apache.org/jira/browse/HIVE-18588
> Project: Hive
>  Issue Type: Test
>  Components: Standalone Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Attachments: HIVE-18588.2.patch, HIVE-18588.patch
>
>
> Runtime for unit tests in standalone-metastore are now exceeding 25 minutes.  
> Ideally unit tests should finish within 2-3 minutes so users will run them 
> frequently.  To solve this I propose to carve off many of the slower tests to 
> run in a new 'checkin' profile.  This profile should be run before checkin 
> and by the ptest infrastructure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18588) Add 'checkin' profile that runs slower tests in standalone-metastore

2018-02-01 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-18588:
--
Attachment: HIVE-18588.2.patch

> Add 'checkin' profile that runs slower tests in standalone-metastore
> 
>
> Key: HIVE-18588
> URL: https://issues.apache.org/jira/browse/HIVE-18588
> Project: Hive
>  Issue Type: Test
>  Components: Standalone Metastore
>Affects Versions: 3.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Major
> Attachments: HIVE-18588.2.patch, HIVE-18588.patch
>
>
> Runtime for unit tests in standalone-metastore are now exceeding 25 minutes.  
> Ideally unit tests should finish within 2-3 minutes so users will run them 
> frequently.  To solve this I propose to carve off many of the slower tests to 
> run in a new 'checkin' profile.  This profile should be run before checkin 
> and by the ptest infrastructure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14792) AvroSerde reads the remote schema-file at least once per mapper, per table reference.

2018-02-01 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349351#comment-16349351
 ] 

Aihua Xu commented on HIVE-14792:
-

[~mithun] Ping. Can we keep the default behavior to false still? Otherwise the 
change looks good. Basically your patch is to read the properties from SerDe as 
well in addition to the table level properties, right?

> AvroSerde reads the remote schema-file at least once per mapper, per table 
> reference.
> -
>
> Key: HIVE-14792
> URL: https://issues.apache.org/jira/browse/HIVE-14792
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>Priority: Major
>  Labels: TODOC2.2, TODOC2.4
> Fix For: 3.0.0, 2.4.0, 2.2.1
>
> Attachments: HIVE-14792.1.patch, HIVE-14792.3.patch
>
>
> Avro tables that use "external" schema files stored on HDFS can cause 
> excessive calls to {{FileSystem::open()}}, especially for queries that spawn 
> large numbers of mappers.
> This is because of the following code in {{AvroSerDe::initialize()}}:
> {code:title=AvroSerDe.java|borderStyle=solid}
> public void initialize(Configuration configuration, Properties properties) 
> throws SerDeException {
> // ...
> if (hasExternalSchema(properties)
> || columnNameProperty == null || columnNameProperty.isEmpty()
> || columnTypeProperty == null || columnTypeProperty.isEmpty()) {
>   schema = determineSchemaOrReturnErrorSchema(configuration, properties);
> } else {
>   // Get column names and sort order
>   columnNames = Arrays.asList(columnNameProperty.split(","));
>   columnTypes = 
> TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
>   schema = getSchemaFromCols(properties, columnNames, columnTypes, 
> columnCommentProperty);
>  
> properties.setProperty(AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL.getPropName(),
>  schema.toString());
> }
> // ...
> }
> {code}
> For tables using {{avro.schema.url}}, every time the SerDe is initialized 
> (i.e. at least once per mapper), the schema file is read remotely. For 
> queries with thousands of mappers, this leads to a stampede to the handful 
> (3?) datanodes that host the schema-file. In the best case, this causes 
> slowdowns.
> It would be preferable to distribute the Avro-schema to all mappers as part 
> of the job-conf. The alternatives aren't exactly appealing:
> # One can't rely solely on the {{column.list.types}} stored in the Hive 
> metastore. (HIVE-14789).
> # {{avro.schema.literal}} might not always be usable, because of the 
> size-limit on table-parameters. The typical size of the Avro-schema file is 
> between 0.5-3MB, in my limited experience. Bumping the max table-parameter 
> size isn't a great solution.
> If the {{avro.schema.file}} were read during query-planning, and made 
> available as part of table-properties (but not serialized into the 
> metastore), the downstream logic will remain largely intact. I have a patch 
> that does this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349335#comment-16349335
 ] 

Hive QA commented on HIVE-18581:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908768/HIVE-18581.1.patch

{color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 24 failed/errored test(s), 12965 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=163)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.metastore.client.TestTablesCreateDropAlterTruncate.testAlterTableNullStorageDescriptorInNew[Embedded]
 (batchId=206)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testMapNullKey[3] 
(batchId=193)
org.apache.hive.hcatalog.pig.TestSequenceFileHCatStorer.testWriteDate3 
(batchId=193)
org.apache.hive.hcatalog.pig.TestSequenceFileHCatStorer.testWriteSmallint 
(batchId=193)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8974/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8974/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8974/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 24 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12908768 - PreCommit-HIVE-Build

> Replication events should use lower case db object names
> 
>
> Key: HIVE-18581
> URL: https://issues.apache.org/jira/browse/HIVE-18581
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch
>
>
> events generated by replication should include the database /  tables /  
> partitions / function names in lower case. this will prevent other 
> applications to explicitly do case insensitive match of objects using names. 
> in hive all db object names as specified above are explicitly converted to 
> lower case when comparing between objects of same types. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18581) Replication events should use lower case db object names

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349267#comment-16349267
 ] 

Hive QA commented on HIVE-18581:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
37s{color} | {color:red} ql: The patch generated 36 new + 12 unchanged - 0 
fixed = 48 total (was 12) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} itests/hive-unit: The patch generated 0 new + 643 
unchanged - 2 fixed = 643 total (was 645) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 11 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 32b8994 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8974/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8974/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8974/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Replication events should use lower case db object names
> 
>
> Key: HIVE-18581
> URL: https://issues.apache.org/jira/browse/HIVE-18581
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-18581.0.patch, HIVE-18581.1.patch
>
>
> events generated by replication should include the database /  tables /  
> partitions / function names in lower case. this will prevent other 
> applications to explicitly do case insensitive match of objects using names. 
> in hive all db object names as specified above are explicitly converted to 
> lower case when comparing between objects of same types. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-13981) Operation.toSQLException eats full exception stack

2018-02-01 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-13981:
--
Attachment: HIVE-13981.2.patch

> Operation.toSQLException eats full exception stack
> --
>
> Key: HIVE-13981
> URL: https://issues.apache.org/jira/browse/HIVE-13981
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-13981.1.patch, HIVE-13981.2.patch
>
>
> Operation.toSQLException eats half of the exception stack and make debug 
> hard. For example, we saw an exception:
> {code}
> org.apache.hive.service.cli.HiveSQL Exception : Error while compiling 
> statement: FAILED : NullPointer Exception null
> at org.apache.hive.service.cli.operation.Operation.toSQL Exception 
> (Operation.java:336)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:113)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:182)
> at org.apache.hive.service.cli.operation.Operation.run(Operation.java:278)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:421)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:408)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:276)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:505)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:562)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang. NullPointer Exception
> {code}
> The real stack causing the NPE is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-13981) Operation.toSQLException eats full exception stack

2018-02-01 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349263#comment-16349263
 ] 

Daniel Dai commented on HIVE-13981:
---

Try to get in, rerun ptest.

> Operation.toSQLException eats full exception stack
> --
>
> Key: HIVE-13981
> URL: https://issues.apache.org/jira/browse/HIVE-13981
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-13981.1.patch, HIVE-13981.2.patch
>
>
> Operation.toSQLException eats half of the exception stack and make debug 
> hard. For example, we saw an exception:
> {code}
> org.apache.hive.service.cli.HiveSQL Exception : Error while compiling 
> statement: FAILED : NullPointer Exception null
> at org.apache.hive.service.cli.operation.Operation.toSQL Exception 
> (Operation.java:336)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:113)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:182)
> at org.apache.hive.service.cli.operation.Operation.run(Operation.java:278)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:421)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:408)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:276)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:505)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:562)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang. NullPointer Exception
> {code}
> The real stack causing the NPE is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349250#comment-16349250
 ] 

Deepak Jaiswal commented on HIVE-18350:
---

[~jdere] [~gopalv] [~ekoifman] can you please review?

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18516) load data should rename files consistent with insert statements for ACID Tables

2018-02-01 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349249#comment-16349249
 ] 

Deepak Jaiswal commented on HIVE-18516:
---

[~jdere] [~ekoifman] can you please review?

> load data should rename files consistent with insert statements for ACID 
> Tables
> ---
>
> Key: HIVE-18516
> URL: https://issues.apache.org/jira/browse/HIVE-18516
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18516.1.patch, HIVE-18516.10.patch, 
> HIVE-18516.2.patch, HIVE-18516.3.patch, HIVE-18516.4.patch, 
> HIVE-18516.5.patch, HIVE-18516.6.patch, HIVE-18516.7.patch, 
> HIVE-18516.8.patch, HIVE-18516.9.patch
>
>
> h1. load data should rename files consistent with insert statements for ACID 
> Tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349133#comment-16349133
 ] 

Deepak Jaiswal edited comment on HIVE-18350 at 2/1/18 8:48 PM:
---

Please ignore the failure of test smb_mapjoin_7. its fix is coming in 
HIVE-18516.

 

Updated the results for bucket_mapjoin_mismatch1 test


was (Author: djaiswal):
Please ignore the failure of test smb_mapjoin_7. its fix is coming in 
HIVE-18516.

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.9.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18601) Support Power platform by updating protoc-jar-maven-plugin version

2018-02-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349214#comment-16349214
 ] 

Hive QA commented on HIVE-18601:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12908759/HIVE-18601.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 12965 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=36)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=175)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_input_format_excludes]
 (batchId=163)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=180)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=122)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=221)
org.apache.hadoop.hive.metastore.client.TestGetPartitions.testGetPartitionWithAuthInfoNoDbName[Embedded]
 (batchId=206)
org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testFailureOnAlteringTransactionalProperties
 (batchId=290)
org.apache.hadoop.hive.ql.exec.TestOperators.testNoConditionalTaskSizeForLlap 
(batchId=282)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=256)
org.apache.hive.beeline.cli.TestHiveCli.testNoErrorDB (batchId=188)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=234)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=234)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8973/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8973/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8973/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 22 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12908759 - PreCommit-HIVE-Build

> Support Power platform by updating protoc-jar-maven-plugin version
> --
>
> Key: HIVE-18601
> URL: https://issues.apache.org/jira/browse/HIVE-18601
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.0.0
> Environment: # uname -a
> Linux pts00607-vm16 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:05:18 UTC 
> 2016 ppc64le ppc64le ppc64le GNU/Linux
>  # # cat /etc/lsb-release
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=16.04
> DISTRIB_CODENAME=xenial
> DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
>Reporter: Pravin Dsilva
>Assignee: Pravin Dsilva
>Priority: Major
> Attachments: HIVE-18601.patch
>
>
> Below is error is seen while building standalone-metastore project
> {code:java}
> [INFO] --- protoc-jar-maven-plugin:3.0.0-a3:run 
> (default) @ hive-standalone-metastore ---
> [INFO] Protoc version: 2.5.0
> [INFO] Input directories:
> [INFO] 
> /var/lib/jenkins/workspace/hive/standalone-metastore/src/main/protobuf/org/apache/hadoop/hive/metastore
> [INFO] Output targets:
> [INFO] java: 
> /var/lib/jenkins/workspace/hive/standalone-metastore/target/generated-sources 
> (add: none, clean: false)
> [INFO] 
> /var/lib/jenkins/workspace/hive/standalone-metastore/target/generated-sources 
> does not exist. Creating...
> [INFO] Processing (java): metastore.proto
> protoc-jar: protoc version: 250, detected platform: linux/ppc64

[jira] [Updated] (HIVE-18599) CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write data

2018-02-01 Thread Steve Yeom (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-18599:
--
Attachment: HIVE-18599.02.patch

> CREATE TEMPORARY TABLE AS SELECT(CTTAS) on Micromanaged table does not write 
> data
> -
>
> Key: HIVE-18599
> URL: https://issues.apache.org/jira/browse/HIVE-18599
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Attachments: HIVE-18599.01.patch, HIVE-18599.02.patch
>
>
> CTTAS on temporary micromanaged table does not write data. 
> I.e., "SELECT * FROM ctas0_mm;" does not return any rows from the below 
> script:
>  
> set hive.mapred.mode=nonstrict;
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set tez.grouping.min-size=1;
> set tez.grouping.max-size=2;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
>  
> drop table intermediate;
> create table intermediate(key int) partitioned by (p int) stored as orc;
> insert into table intermediate partition(p='455') select distinct key from 
> src where key >= 0 order by key desc limit 2;
> insert into table intermediate partition(p='456') select distinct key from 
> src where key is not null order by key asc limit 2;
> insert into table intermediate partition(p='457') select distinct key from 
> src where key >= 100 order by key asc limit 2;
>   
> drop table ctas0_mm; 
> explain create temporary table ctas0_mm tblproperties 
> ("transactional"="true", "transactional_properties"="insert_only") as select 
> * from intermediate;
> create temporary table ctas0_mm tblproperties ("transactional"="true", 
> "transactional_properties"="insert_only") as select * from intermediate;
>  
> select * from ctas0_mm;
> drop table ctas0_mm;
> drop table intermediate;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
Status: Patch Available  (was: Open)

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
Attachment: HIVE-18606.01.patch

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18606.01.patch
>
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18528) Stats: In the bitvector codepath, when extrapolating column stats for String type columnStringColumnStatsAggregator uses the min value instead of max

2018-02-01 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-18528:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to master. Thanks [~ashutoshc]

> Stats: In the bitvector codepath, when extrapolating column stats for String 
> type columnStringColumnStatsAggregator uses the min value instead of max
> -
>
> Key: HIVE-18528
> URL: https://issues.apache.org/jira/browse/HIVE-18528
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>Priority: Major
> Attachments: HIVE-18528.1.patch, HIVE-18528.1.patch
>
>
> This line: 
> [https://github.com/apache/hive/blob/456a65180dcb84f69f26b4c9b9265165ad16dfe4/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/columnstats/aggr/StringColumnStatsAggregator.java#L181]
> Should be: 
> aggregateData.setAvgColLen(Math.max(aggregateData.getAvgColLen(), 
> newData.getAvgColLen()));



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18606:
--
Component/s: Transactions

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18606) CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask

2018-02-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18606:
-

Assignee: Eugene Koifman

> CTAS on empty table throws NPE from org.apache.hadoop.hive.ql.exec.MoveTask
> ---
>
> Key: HIVE-18606
> URL: https://issues.apache.org/jira/browse/HIVE-18606
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> {noformat}
> @Test
> public void testCtasEmpty() throws Exception {
>   MetastoreConf.setBoolVar(hiveConf, 
> MetastoreConf.ConfVars.CREATE_TABLES_AS_ACID, true);
>   runStatementOnDriver("create table myctas stored as ORC as" +
>   " select a, b from " + Table.NONACIDORCTBL);
>   List rs = runStatementOnDriver("select ROW__ID, a, b, 
> INPUT__FILE__NAME" +
>   " from myctas order by ROW__ID");
> }
> {noformat}
> {noformat}
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> metastore.HiveMetaStore (HiveMetaStore.java:logInfo(822)) - 114: Done 
> cleaning up thread local RawStore
> 2018-02-01T19:08:52,813 INFO  [HiveServer2-Background-Pool: Thread-463]: 
> HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(305)) - ugi=hive 
> ip=unknown-ip-addr  cmd=Done cleaning up thread local RawStore
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> exec.Task (SessionState.java:printError(1228)) - Failed with exception null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.metadata.Hive.moveAcidFiles(Hive.java:3816)
> at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2267)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1919)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1651)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1395)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1388)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:253)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2018-02-01T19:08:52,815 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> ql.Driver (SessionState.java:printError(1228)) - FAILED: Execution Error, 
> return code 1 from {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-02-01 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-18410:
---
Attachment: HIVE-18410_2.patch

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>Priority: Major
> Fix For: 3.0.0, 2.3.2
>
> Attachments: HIVE-18410.patch, HIVE-18410_1.patch, 
> HIVE-18410_2.patch, profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >