date:20190326

[jira] [Commented] (HIVE-21496) Automatic sizing of unordered buffer can overflow

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801463#comment-16801463
 ] 

Hive QA commented on HIVE-21496:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
28s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16683/dev-support/hive-personality.sh
 |
| git revision | master / 80998ad |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16683/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Automatic sizing of unordered buffer can overflow
> -
>
> Key: HIVE-21496
> URL: https://issues.apache.org/jira/browse/HIVE-21496
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21496.01.patch, HIVE-21496.02.patch, hive.log
>
>
> HIVE-21329 added automatic sizing of tez unordered partitioned KV buffer 
> based on group by statistics. However, some corner cases for group by 
> statistics sets Long.MAX for data size. This ends up setting Integer.MAX for 
> unordered KV buffer size. This buffer size is expected to be in MB. 
> Converting Integer.MAX value from MB to bytes will overflow and following 
> exception is thrown.
> {code:java}
> 2019-03-23T01:35:17,760 INFO [Dispatcher thread {Central}] 
> HistoryEventHandler.criticalEvents: 
> [HISTORY][DAG:dag_1553330105749_0001_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1553330105749_0001_1_00_00_0, 
> creationTime=1553330117468, allocationTime=1553330117524, 
> startTime=1553330117562, finishTime=1553330117755, timeTaken=193, 
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR, 
> diagnostics=Error: Error while running task ( failure ) : 
> attempt_1553330105749_0001_1_00_00_0:java.lang.IllegalArgumentException
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
> at 
>

[jira] [Commented] (HIVE-21496) Automatic sizing of unordered buffer can overflow

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801473#comment-16801473
 ] 

Hive QA commented on HIVE-21496:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963683/HIVE-21496.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15840 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16683/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16683/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16683/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963683 - PreCommit-HIVE-Build

> Automatic sizing of unordered buffer can overflow
> -
>
> Key: HIVE-21496
> URL: https://issues.apache.org/jira/browse/HIVE-21496
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21496.01.patch, HIVE-21496.02.patch, hive.log
>
>
> HIVE-21329 added automatic sizing of tez unordered partitioned KV buffer 
> based on group by statistics. However, some corner cases for group by 
> statistics sets Long.MAX for data size. This ends up setting Integer.MAX for 
> unordered KV buffer size. This buffer size is expected to be in MB. 
> Converting Integer.MAX value from MB to bytes will overflow and following 
> exception is thrown.
> {code:java}
> 2019-03-23T01:35:17,760 INFO [Dispatcher thread {Central}] 
> HistoryEventHandler.criticalEvents: 
> [HISTORY][DAG:dag_1553330105749_0001_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1553330105749_0001_1_00_00_0, 
> creationTime=1553330117468, allocationTime=1553330117524, 
> startTime=1553330117562, finishTime=1553330117755, timeTaken=193, 
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR, 
> diagnostics=Error: Error while running task ( failure ) : 
> attempt_1553330105749_0001_1_00_00_0:java.lang.IllegalArgumentException
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
> at 
> org.apache.tez.runtime.common.resources.MemoryDistributor.registerRequest(MemoryDistributor.java:177)
> at 
> org.apache.tez.runtime.common.resources.MemoryDistributor.requestMemory(MemoryDistributor.java:110)
> at 
> org.apache.tez.runtime.api.impl.TezTaskContextImpl.requestInitialMemory(TezTaskContextImpl.java:214)
> at 
> org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput.initialize(UnorderedPartitionedKVOutput.java:76)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:537)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:520)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:505)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> Stats for GBY operator is getting Long.MAX_VALUE as seen below
> {code:java}
> 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [0] STATS-TS[0] (logs): numRows: 1795 
> dataSize: 4443078 basicStatsState: PARTIAL colStatsState: NONE colStats: 
> {severity= colName: severity colType: string countDistincts: 359 numNulls: 89 
> avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: 
> true}
> 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: Estimating row count for 
> GenericUDFOPEqual(Column[severity], Const string ERROR) Original num rows: 
> 1795 New num rows: 5
> 2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] 
> annotation.StatsRulesProcFactory: [1] STATS-FIL[8]: numRows: 5 dataSize: 
> 12376

[jira] [Updated] (HIVE-21500) Replicate conversion of managed table to external at source.

2019-03-26 Thread Sankar Hariappan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21500:

Summary: Replicate conversion of managed table to external at source.  
(was: Support converting managed ACID table to external if the corresponding 
non-ACID table is converted to external at source.)

> Replicate conversion of managed table to external at source.
> 
>
> Key: HIVE-21500
> URL: https://issues.apache.org/jira/browse/HIVE-21500
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication
>
> For the below scenario of Hive2  to Hive3 replication (with strict 
> managed=true), the managed ACID table at target should be converted to 
> external table.
> 1. Create non-ACID ORC format table.
> 2. Insert some rows
> 3. Replicate this create event which creates ACID table at target (due to 
> migration rule). Each insert event adds metadata in HMS corresponding to the 
> current table.
> 4. Convert table to external table using ALTER command.
> 5. Replicating this alter event should convert ACID table to external table 
> and make sure corresponding metadata are removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21500) Replicate conversion of managed table to external at source.

2019-03-26 Thread Sankar Hariappan (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21500:

Description: 
Couple of scenarios for Hive2 to Hive3(strict managed tables enabled) 
replication where managed table is converted to external at source. 
*Scenario-1: (ACID/MM table converted to external at target)*
1. Create non-ACID ORC format table.
2. Insert some rows
3. Replicate this create event which creates ACID table at target (due to 
migration rule). Each insert event adds transactional metadata in HMS 
corresponding to the current table.
4. Convert table to external table using ALTER command at source.
5. Replicating this alter event should convert ACID table to external table and 
make sure corresponding metadata are removed.

*Scenario-2: (External table at target changes table location)*
1. Create non-ACID avro format table.
2. Insert some rows
3. Replicate this create event which creates external table at target (due to 
migration rule). The data path is chosen under default external warehouse 
directory.
4. Convert table to external table using ALTER command at source.
5. Replicating this alter event should update the table/partitions location as 
data moved under external tables base directory.

  was:
For the below scenario of Hive2  to Hive3 replication (with strict 
managed=true), the managed ACID table at target should be converted to external 
table.
1. Create non-ACID ORC format table.
2. Insert some rows
3. Replicate this create event which creates ACID table at target (due to 
migration rule). Each insert event adds metadata in HMS corresponding to the 
current table.
4. Convert table to external table using ALTER command.
5. Replicating this alter event should convert ACID table to external table and 
make sure corresponding metadata are removed.


> Replicate conversion of managed table to external at source.
> 
>
> Key: HIVE-21500
> URL: https://issues.apache.org/jira/browse/HIVE-21500
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication
>
> Couple of scenarios for Hive2 to Hive3(strict managed tables enabled) 
> replication where managed table is converted to external at source. 
> *Scenario-1: (ACID/MM table converted to external at target)*
> 1. Create non-ACID ORC format table.
> 2. Insert some rows
> 3. Replicate this create event which creates ACID table at target (due to 
> migration rule). Each insert event adds transactional metadata in HMS 
> corresponding to the current table.
> 4. Convert table to external table using ALTER command at source.
> 5. Replicating this alter event should convert ACID table to external table 
> and make sure corresponding metadata are removed.
> *Scenario-2: (External table at target changes table location)*
> 1. Create non-ACID avro format table.
> 2. Insert some rows
> 3. Replicate this create event which creates external table at target (due to 
> migration rule). The data path is chosen under default external warehouse 
> directory.
> 4. Convert table to external table using ALTER command at source.
> 5. Replicating this alter event should update the table/partitions location 
> as data moved under external tables base directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21507:
--
Status: Patch Available  (was: Open)

[~daijy] [~kgyrtkirk] please review. Thanks.

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507-001.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita reassigned HIVE-21509:
-


> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21507:
--
Attachment: HIVE-21507-002.patch

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507-001.patch, HIVE-21507-002.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21000) Upgrade thrift to at least 0.10.0

2019-03-26 Thread Ivan Suller (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801428#comment-16801428
 ] 

Ivan Suller commented on HIVE-21000:


It seems Accumulo works only with version 0.9.3 of Thrift. So now I'm down to 
try to shade it to have two different versions at the same time. Not sure 
that'll work, but that's my last idea.

> Upgrade thrift to at least 0.10.0
> -
>
> Key: HIVE-21000
> URL: https://issues.apache.org/jira/browse/HIVE-21000
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Ivan Suller
>Priority: Major
> Attachments: HIVE-21000.01.patch, HIVE-21000.02.patch, 
> HIVE-21000.03.patch, HIVE-21000.04.patch, HIVE-21000.05.patch, 
> HIVE-21000.06.patch, HIVE-21000.07.patch, HIVE-21000.08.patch, 
> sampler_before.png
>
>
> I was looking into some compile profiles for tables with lots of columns; and 
> it turned out that [thrift 0.9.3 is allocating a 
> List|https://github.com/apache/hive/blob/8e30b5e029570407d8a1db67d322a95db705750e/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FieldSchema.java#L348]
>  during every hashcode calculation; but luckily THRIFT-2877 is improving on 
> that - so I propose to upgrade to at least 0.10.0 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo reassigned HIVE-21507:
-


> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21503) Vectorization: query with regex gives incorrect results with vectorization

2019-03-26 Thread Laszlo Bodor (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801572#comment-16801572
 ] 

Laszlo Bodor commented on HIVE-21503:
-

[~rajesh.balamohan]: I didn't manage to reproduce it on current master, could 
you please take a look at  [^HIVE-21503.01.WIP.patch] , am I missing something?

> Vectorization: query with regex gives incorrect results with vectorization
> --
>
> Key: HIVE-21503
> URL: https://issues.apache.org/jira/browse/HIVE-21503
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-21503.01.WIP.patch
>
>
> i see wrong results with vectorization. Without vectorization, it works fine. 
> {noformat}
> e.g 
> WHEN x like '%radio%' THEN 'radio' 
> WHEN x like '%tv%' THEN 'tv'
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21506) Memory based TxnHandler implementation

2019-03-26 Thread Peter Vary (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801422#comment-16801422
 ] 

Peter Vary commented on HIVE-21506:
---

What do you think?

CC: [~gopalv], [~tlipcon], [~vgumashta]

> Memory based TxnHandler implementation
> --
>
> Key: HIVE-21506
> URL: https://issues.apache.org/jira/browse/HIVE-21506
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Peter Vary
>Priority: Major
>
> The current TxnHandler implementations are using the backend RDBMS to store 
> every Hive lock and transaction data, so multiple TxnHandler instances can 
> run simultaneously and can serve requests. The continuous 
> communication/locking done on the RDBMS side puts serious load on the backend 
> databases also restricts the possible throughput.
> If it is possible to have only a single active TxnHandler (with the current 
> design HMS) instance then we can provide much better (using only java based 
> locking) performance. We still have to store the committed write transactions 
> to the RDBMS (or later some other persistent storage), but other lock and 
> transaction operations could remain memory only.
> The most important drawbacks with this solution is that we definitely lose 
> scalability when one instance of TxnHandler is no longer able to serve the 
> requests (see NameNode), and fault tolerance in the sense that the ongoing 
> transactions should be terminated when the TxnHandler is failed. If this 
> drawbacks are acceptable in certain situations the we can provide better 
> throughput for the users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21497) Direct SQL exception thrown by PartitionManagementTask

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801421#comment-16801421
 ] 

Hive QA commented on HIVE-21497:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m 
17s{color} | {color:blue} standalone-metastore/metastore-server in master has 
179 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} standalone-metastore/metastore-server: The patch 
generated 0 new + 283 unchanged - 1 fixed = 283 total (was 284) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16682/dev-support/hive-personality.sh
 |
| git revision | master / 80998ad |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: standalone-metastore/metastore-server U: 
standalone-metastore/metastore-server |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16682/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Direct SQL exception thrown by PartitionManagementTask
> --
>
> Key: HIVE-21497
> URL: https://issues.apache.org/jira/browse/HIVE-21497
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21497.3.patch
>
>
> Metastore runs background thread out of which one is partition discovery. 
> While removing expired partitions following exception is thrown
> {code:java}
> 2019-03-24 04:24:59.583 WARN [PartitionDiscoveryTask-0] 
> metastore.MetaStoreDirectSql: Failed to execute [select 
> "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join 
> "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join 
> "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 inner join 
> "PARTITION_KEY_VALS" "FILTER2" on "FILTER2"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER2"."INTEGER_IDX" = 2 where 
> "DBS"."CTLG_NAME" = ? and ( ( (((case when "FILTER0"."PART_KEY_VAL" <> ? and 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? and

[jira] [Work started] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-21509 started by Adam Szita.
-
> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, which is in text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-21509:
--
Description: 
In some scenarios, LLAP might store column vectors in cache that are getting 
reused and reset just before their original content would be written.

The issue is a concurrency issue and is thereby flaky. It is not easy to 
reproduce, but the odds of surfacing this issue can by improved by setting LLAP 
executor and IO thread counts this way:
 * set hive.llap.daemon.num.executors=32;
 * set hive.llap.io.threadpool.size=1;
 * using TPCDS input data of store_sales table, which is in text format:

{code:java}
ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  WITH 
SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  STORED 
AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  OUTPUTFORMAT    
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}

 * run query on this this table: select min(ss_sold_date_sk) from store_sales;

The first query result is correct (2450816 in my case). Repeating the query 
will trigger reading from LLAP cache and produce a wrong result: 0.

If one wants to make sure of running into this issue, place a Thread.sleep(250) 
at the beginning of VectorDeserializeOrcWriter#run().

 

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, which is in text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-26 Thread Karen Coppage (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21290:
-
Status: Open  (was: Patch Available)

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, HIVE-21290.4.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801561#comment-16801561
 ] 

Adam Szita commented on HIVE-21509:
---

Looks like a quite serious issue. The root cause is as follows:
 # At the first execution of the query the LLAP IO thread has to read from the 
input text file and produce (Long)ColumnVectors (CVs) from the data, wrapped 
into VectorizedRowBatches (VRBs).
 # These VRBs are
 ## passed by VectorDeserializeOrcWriter to a newly created async ORC writer 
thread for ORC encoding and cache persistence. 
 ## also propagated back to consumers, namely to OrcEncodedDataConsumer and 
then finally to LLAPRecordReader.
 # The ORC writer thread may get to writing out the VRB (and therefore the CV) 
only after that the IO thread has:
 ## Created a CVB in OrcEncodedDataConsumer#decodeBatch to wrap the CV coming 
from VRB and passed the batch to LLAPRecordReader
 ## LLAPRecordReader used this batch and is receiving a new one. This time (on 
Tez thread) it will return the previous CVB and offer it back to an object pool 
so that the next decodeBatch may reuse it.
 ## The next decodeBatch call polls this reused CVB from the pool and will call 
CV.reset() on the CVs wrapped inside, and finally it will also overwrite the 
existing data in there
 ## and now is the time that the ORC writer thread got to writing the VRB 
and therefore the very same CVs into cache, that have just been modified in the 
meantime due to this re-using logic of LLAPRecordReader and 
OrcEncodedDataConsumer
 ## (I guess this is why high executor count vs low IO thread count helps 
surfacing this issue: the 32 Tez threads are very fast returning the used CVBs, 
but the one IO thread and its one ORC writer thread is outnumbered when trying 
to write it out in time, before it'd get corrupted)
 # Because of this the correct query result will be displayed at first 
(LLAPRecordReader does get all the correct CVs), but the content written in 
cache is corrupted
 # The second run of this query will go directly to cache and use the corrupted 
data there to produce a wrong result this time.

 

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, which is in text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-26 Thread Karen Coppage (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21290:
-
Attachment: HIVE-21290.5.patch
Status: Patch Available  (was: Open)

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, 
> HIVE-21290.4.patch, HIVE-21290.5.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-19034) hadoop fs test can check srcipt ok, but beeline -f report no such file

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi reassigned HIVE-19034:
---

Assignee: Bruno Pusztahazi

> hadoop fs test can check srcipt ok, but beeline -f report no such file
> --
>
> Key: HIVE-19034
> URL: https://issues.apache.org/jira/browse/HIVE-19034
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
> Environment: java version: 1.8.0_112-b15
> hadoop version: 2.7.2
> hive version:1.3.0
> hive JDBS version: 1.3.0
> beeline version: 1.3.0
>Reporter: fengxianghui
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 1.3.0
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801542#comment-16801542
 ] 

Zoltan Haindrich commented on HIVE-21509:
-

cc: [~isuller] this could be the same issue you have been running into?

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, which is in text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Ivan Suller (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801560#comment-16801560
 ] 

Ivan Suller commented on HIVE-21509:


[~kgyrtkirk] it is possible. I already closed the ticket tracking that issue, 
because I couldn't reproduce it anymore. But if it is a cache issue this is 
expected.

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, which is in text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21497) Direct SQL exception thrown by PartitionManagementTask

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801446#comment-16801446
 ] 

Hive QA commented on HIVE-21497:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963680/HIVE-21497.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15840 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16682/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16682/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16682/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963680 - PreCommit-HIVE-Build

> Direct SQL exception thrown by PartitionManagementTask
> --
>
> Key: HIVE-21497
> URL: https://issues.apache.org/jira/browse/HIVE-21497
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21497.3.patch
>
>
> Metastore runs background thread out of which one is partition discovery. 
> While removing expired partitions following exception is thrown
> {code:java}
> 2019-03-24 04:24:59.583 WARN [PartitionDiscoveryTask-0] 
> metastore.MetaStoreDirectSql: Failed to execute [select 
> "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join 
> "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join 
> "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 inner join 
> "PARTITION_KEY_VALS" "FILTER2" on "FILTER2"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER2"."INTEGER_IDX" = 2 where 
> "DBS"."CTLG_NAME" = ? and ( ( (((case when "FILTER0"."PART_KEY_VAL" <> ? and 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? and "DBS"."CTLG_NAME" = ? and 
> "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 
> then cast("FILTER0"."PART_KEY_VAL" as date) else null end) = ?) and 
> ("FILTER1"."PART_KEY_VAL" = ?)) and ("FILTER2"."PART_KEY_VAL" = ?)) )] with 
> parameters [logs, sys, hive, __HIVE_DEFAULT_PARTITION__, logs, sys, hive, 
> 2019-03-23, warehouse-1553300821-692w, metastore-db-create-job]
> javax.jdo.JDODataStoreException: Error executing SQL query "select 
> "PARTITIONS"."PART_ID" from "PARTITIONS" inner join "TBLS" on 
> "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" and "TBLS"."TBL_NAME" = ? inner join 
> "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" and "DBS"."NAME" = ? inner join 
> "PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 inner join 
> "PARTITION_KEY_VALS" "FILTER1" on "FILTER1"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER1"."INTEGER_IDX" = 1 inner join 
> "PARTITION_KEY_VALS" "FILTER2" on "FILTER2"."PART_ID" = 
> "PARTITIONS"."PART_ID" and "FILTER2"."INTEGER_IDX" = 2 where 
> "DBS"."CTLG_NAME" = ? and ( ( (((case when "FILTER0"."PART_KEY_VAL" <> ? and 
> "TBLS"."TBL_NAME" = ? and "DBS"."NAME" = ? and "DBS"."CTLG_NAME" = ? and 
> "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX" = 0 
> then cast("FILTER0"."PART_KEY_VAL" as date) else null end) = ?) and 
> ("FILTER1"."PART_KEY_VAL" = ?)) and ("FILTER2"."PART_KEY_VAL" = ?)) )".
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
> at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391)
> at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2042)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionIdsViaSqlFilter(MetaStoreDirectSql.java:621)
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:487)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getSqlResult(ObjectStore.java:3426)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getSqlResult(ObjectStore.java:3418)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3702)
> at 
>

[jira] [Commented] (HIVE-21508) ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801488#comment-16801488
 ] 

Zoltan Haindrich commented on HIVE-21508:
-

I think it's not yet supported/recommended to run hive with java >=9 
IIRC it have stopped somewhere around that we need tez to be able to run with 
jdk9 first
HIVE-17632 => HIVE-17909 => TEZ-3860

> ClassCastException when initializing HiveMetaStoreClient on JDK10 or newer
> --
>
> Key: HIVE-21508
> URL: https://issues.apache.org/jira/browse/HIVE-21508
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 3.2.0, 2.3.4
>Reporter: Adar Dembo
>Priority: Major
>
> There's this block of code in {{HiveMetaStoreClient:resolveUris}} (called 
> from the constructor) on master:
> {noformat}
>   private URI metastoreUris[];
>   ...
>   if (MetastoreConf.getVar(conf, 
> ConfVars.THRIFT_URI_SELECTION).equalsIgnoreCase("RANDOM")) {
> List uriList = Arrays.asList(metastoreUris);
> Collections.shuffle(uriList);
> metastoreUris = (URI[]) uriList.toArray();
>   }
> {noformat}
> The cast to {{URI[]}} throws a {{ClassCastException}} beginning with JDK 10, 
> possibly with JDK 9 as well. Note that {{THRIFT_URI_SELECTION}} defaults to 
> {{RANDOM}} so this should affect anyone who creates a 
> {{HiveMetaStoreClient}}. On master this can be overridden with {{SEQUENTIAL}} 
> to avoid the broken case; I'm working against 2.3.4 where there's no such 
> workaround.
> [Here's|https://stackoverflow.com/questions/51372788/array-cast-java-8-vs-java-9]
>  a StackOverflow post that explains the issue in more detail. Interestingly, 
> the author described the issue in the context of the HMS; not sure why there 
> was no follow up with a Hive bug report.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21507:
--
Attachment: HIVE-21507-001.patch

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507-001.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21485) Hive desc operation takes more than 100 seconds after upgrading from Hive 1.2.1 to 2.3.4

2019-03-26 Thread Qingxin Wu (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801458#comment-16801458
 ] 

Qingxin Wu commented on HIVE-21485:
---

Yes, that's what I implement in this patch. Adding following parameter to skip 
this expensive step.
{code:java}
hive.display.partitioned.table.stats=true/false
{code}


> Hive desc operation takes more than 100 seconds after upgrading from Hive 
> 1.2.1 to 2.3.4
> 
>
> Key: HIVE-21485
> URL: https://issues.apache.org/jira/browse/HIVE-21485
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Hive
>Affects Versions: 2.3.4
>Reporter: Qingxin Wu
>Assignee: Qingxin Wu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21485.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hive desc [formatted|extended] operation cost more than 100 seconds after 
> upgrading from Hive 1.2.1 to 2.3.4. This is mainly caused by showing stats 
> for partitioned tables which was introduced by HIVE-16098 when the 
> partitioned tables have a large amount of partitions. In our case, the number 
> of partition is 187221.
> {code:java}
> hive> desc bus.kafka_data;
> OK
> idstring
> ...
> d map
> stat_date string
> log_idstring
> # Partition Information
> # col_namedata_type   comment
> stat_date string
> log_idstring
> Time taken: 115.342 seconds, Fetched: 42 row(s)
> {code}
> same operation executed in hive-1.2.1 and only cost 2 seconds.
> {code:java}
> hive> desc bus.kafka_data;
> OK
> idstring
> ...
> d map
> stat_date string
> log_idstring
> # Partition Information
> # col_namedata_type   comment
> stat_date string
> log_idstring
> Time taken: 2.037 seconds, Fetched: 42 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-26 Thread Karen Coppage (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797158#comment-16797158
 ] 

Karen Coppage edited comment on HIVE-21290 at 3/26/19 10:30 AM:


Patch 1 notes:
* Timestamps are converted from JVM time zone, not session ("set time zone...") 
time zone, this is for backwards compatibility reasons.
*  The writer time zone has to be passed through all the vectorized readers so 
that 
org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory.TypesFromInt96PageReader#convert
 can correctly convert int96 to Timestamp. edit: this is for schema evolution
* ^ It might be a better idea to pass the entire reader metadata (Map with ~5 elements) instead of extracting skipConversion (boolean) and 
writerTimezone (ZoneId) and passing these through all those constructors. Any 
input is welcome.



was (Author: klcopp):
Patch 1 notes:
* Timestamps are converted from JVM time zone, not session ("set time zone...") 
time zone, this is for backwards compatibility reasons.
*  The writer time zone has to be passed through all the vectorized readers so 
that 
org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory.TypesFromInt96PageReader#convert
 can correctly convert int96 to Timestamp.
* ^ It might be a better idea to pass the entire reader metadata (Map with ~5 elements) instead of extracting skipConversion (boolean) and 
writerTimezone (ZoneId) and passing these through all those constructors. Any 
input is welcome.


> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, 
> HIVE-21290.4.patch, HIVE-21290.5.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21503) Vectorization: query with regex gives incorrect results with vectorization

2019-03-26 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-21503:

Attachment: HIVE-21503.01.WIP.patch

> Vectorization: query with regex gives incorrect results with vectorization
> --
>
> Key: HIVE-21503
> URL: https://issues.apache.org/jira/browse/HIVE-21503
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Rajesh Balamohan
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-21503.01.WIP.patch
>
>
> i see wrong results with vectorization. Without vectorization, it works fine. 
> {noformat}
> e.g 
> WHEN x like '%radio%' THEN 'radio' 
> WHEN x like '%tv%' THEN 'tv'
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801623#comment-16801623
 ] 

Hive QA commented on HIVE-21290:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 4s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
38s{color} | {color:blue} common in master has 63 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
28s{color} | {color:blue} ql in master has 2255 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
52s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} common: The patch generated 0 new + 3 unchanged - 2 
fixed = 3 total (was 5) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
45s{color} | {color:red} ql: The patch generated 22 new + 195 unchanged - 20 
fixed = 217 total (was 215) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m 
10s{color} | {color:red} root: The patch generated 22 new + 198 unchanged - 22 
fixed = 220 total (was 220) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16685/dev-support/hive-personality.sh
 |
| git revision | master / 80998ad |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16685/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16685/yetus/diff-checkstyle-root.txt
 |
| modules | C: common ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16685/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch,

[jira] [Commented] (HIVE-21290) Restore historical way of handling timestamps in Parquet while keeping the new semantics at the same time

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801622#comment-16801622
 ] 

Hive QA commented on HIVE-21290:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963705/HIVE-21290.5.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15842 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16685/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16685/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16685/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963705 - PreCommit-HIVE-Build

> Restore historical way of handling timestamps in Parquet while keeping the 
> new semantics at the same time
> -
>
> Key: HIVE-21290
> URL: https://issues.apache.org/jira/browse/HIVE-21290
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21290.1.patch, HIVE-21290.2.patch, 
> HIVE-21290.2.patch, HIVE-21290.3.patch, HIVE-21290.4.patch, 
> HIVE-21290.4.patch, HIVE-21290.5.patch
>
>
> This sub-task is for implementing the Parquet-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-03-26 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21231:
--
Status: Open  (was: Patch Available)

> HiveJoinAddNotNullRule support for range predicates
> ---
>
> Key: HIVE-21231
> URL: https://issues.apache.org/jira/browse/HIVE-21231
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: newbie
> Attachments: HIVE-21231.01.patch, HIVE-21231.02.patch
>
>
> For instance, given the following query:
> {code:sql}
> SELECT t0.col0, t0.col1
> FROM
>   (
> SELECT col0, col1 FROM tab
>   ) AS t0
>   INNER JOIN
>   (
> SELECT col0, col1 FROM tab
>   ) AS t1
> ON t0.col0 < t1.col0 AND t0.col1 > t1.col1
> {code}
> we could still infer that col0 and col1 cannot be null for any of the inputs. 
> Currently we do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-03-26 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21231:
--
Attachment: HIVE-21231.02.patch

> HiveJoinAddNotNullRule support for range predicates
> ---
>
> Key: HIVE-21231
> URL: https://issues.apache.org/jira/browse/HIVE-21231
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: newbie
> Attachments: HIVE-21231.01.patch, HIVE-21231.02.patch
>
>
> For instance, given the following query:
> {code:sql}
> SELECT t0.col0, t0.col1
> FROM
>   (
> SELECT col0, col1 FROM tab
>   ) AS t0
>   INNER JOIN
>   (
> SELECT col0, col1 FROM tab
>   ) AS t1
> ON t0.col0 < t1.col0 AND t0.col1 > t1.col1
> {code}
> we could still infer that col0 and col1 cannot be null for any of the inputs. 
> Currently we do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-03-26 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21231:
--
Status: Patch Available  (was: Open)

> HiveJoinAddNotNullRule support for range predicates
> ---
>
> Key: HIVE-21231
> URL: https://issues.apache.org/jira/browse/HIVE-21231
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: newbie
> Attachments: HIVE-21231.01.patch, HIVE-21231.02.patch
>
>
> For instance, given the following query:
> {code:sql}
> SELECT t0.col0, t0.col1
> FROM
>   (
> SELECT col0, col1 FROM tab
>   ) AS t0
>   INNER JOIN
>   (
> SELECT col0, col1 FROM tab
>   ) AS t1
> ON t0.col0 < t1.col0 AND t0.col1 > t1.col1
> {code}
> we could still infer that col0 and col1 cannot be null for any of the inputs. 
> Currently we do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21001) Upgrade to calcite-1.19

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801738#comment-16801738
 ] 

Hive QA commented on HIVE-21001:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963723/HIVE-21001.45.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16687/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16687/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16687/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-03-26 13:51:38.289
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-16687/source-prep.txt
+ [[ true == \t\r\u\e ]]
+ rm -rf ivy maven
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-03-26 13:51:39.210
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 80998ad HIVE-21493: BuddyAllocator - Metrics count for allocated 
arenas wrong if preallocation is done (Olli Draese via Slim Bouguerra)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 80998ad HIVE-21493: BuddyAllocator - Metrics count for allocated 
arenas wrong if preallocation is done (Olli Draese via Slim Bouguerra)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-03-26 13:51:40.340
+ rm -rf ../yetus_PreCommit-HIVE-Build-16687
+ mkdir ../yetus_PreCommit-HIVE-Build-16687
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-16687
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16687/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:569: trailing whitespace.
explain cbo select * from part_null where 
/data/hiveptest/working/scratch/build.patch:1072: trailing whitespace.
Map 1 
/data/hiveptest/working/scratch/build.patch:1093: trailing whitespace.
Reducer 2 
/data/hiveptest/working/scratch/build.patch:1152: trailing whitespace.
Map 1 
/data/hiveptest/working/scratch/build.patch:1173: trailing whitespace.
Reducer 2 
warning: squelched 60 whitespace errors
warning: 65 lines add whitespace errors.
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
protoc-jar: executing: [/tmp/protoc2169992906636269887.exe, --version]
libprotoc 2.5.0
protoc-jar: executing: [/tmp/protoc2169992906636269887.exe, 
-I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore,
 
--java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources,
 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
ANTLR Parser Generator  Version 3.5.2
[ERROR] Failed to execute goal on project hive-service-rpc: Could not resolve 
dependencies for project org.apache.hive:hive-service-rpc:jar:4.0.0-SNAPSHOT: 
The following artifacts could not be resolved: 
org.apache.thrift:libfb303:jar:0.9.3, org.apache.thrift:libthrift:jar:0.9.3: 
Could not find artifact org.apache.thrift:libfb303:jar:0.9.3 in datanucleus 
(http://www.datanucleus.org/downloads/maven2) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run

[jira] [Commented] (HIVE-19034) hadoop fs test can check srcipt ok, but beeline -f report no such file

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801804#comment-16801804
 ] 

Zoltan Haindrich commented on HIVE-19034:
-

this feature worked in hive-cli
so...since we recommend to use beeline instead of hive-cli people might "find" 
this as a regression
HIVE-7136 have added this feature for hive-cli

+1 on the patch
please also document it in  (search for -f )
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
hive-cli documentation contained some examples(search for hdfs):
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli

> hadoop fs test can check srcipt ok, but beeline -f report no such file
> --
>
> Key: HIVE-19034
> URL: https://issues.apache.org/jira/browse/HIVE-19034
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
> Environment: java version: 1.8.0_112-b15
> hadoop version: 2.7.2
> hive version:1.3.0
> hive JDBS version: 1.3.0
> beeline version: 1.3.0
>Reporter: fengxianghui
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 1.3.0
>
> Attachments: HIVE-19034.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21507:
--
Attachment: (was: HIVE-21507-002.patch)

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507.001.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21507:
--
Attachment: HIVE-21507.001.patch

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507.001.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21507:
--
Attachment: (was: HIVE-21507-001.patch)

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507.001.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21510) Vectorization: add support for and/or for (constant,column) cases

2019-03-26 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor reassigned HIVE-21510:
---

Assignee: Laszlo Bodor

> Vectorization: add support for and/or for (constant,column) cases
> -
>
> Key: HIVE-21510
> URL: https://issues.apache.org/jira/browse/HIVE-21510
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Laszlo Bodor
>Priority: Major
>
> After HIVE-21001 some selectExpressions will start using VectorUDFAdaptor for 
> "null and x" expressions. Because right now there are 2-3 places from which 
> rewrite will be done to the form of "null and/or x" form; it would be better 
> to support it.
> {code}
> [...]
> selectExpressions: VectorUDFAdaptor((null and dt1 is null))
> [...]
> usesVectorUDFAdaptor: true
> [...]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21478) Metastore cache update shall capture exception

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801709#comment-16801709
 ] 

Zoltan Haindrich commented on HIVE-21478:
-

+1

> Metastore cache update shall capture exception
> --
>
> Key: HIVE-21478
> URL: https://issues.apache.org/jira/browse/HIVE-21478
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21478.1.patch
>
>
> We definitely need to capture any exception during 
> CacheUpdateMasterWork.update(), otherwise, Java would refuse to schedule 
> future update().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21001) Upgrade to calcite-1.19

2019-03-26 Thread Zoltan Haindrich (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21001:

Attachment: HIVE-21001.45.patch

> Upgrade to calcite-1.19
> ---
>
> Key: HIVE-21001
> URL: https://issues.apache.org/jira/browse/HIVE-21001
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21001.01.patch, HIVE-21001.01.patch, 
> HIVE-21001.02.patch, HIVE-21001.03.patch, HIVE-21001.04.patch, 
> HIVE-21001.05.patch, HIVE-21001.06.patch, HIVE-21001.06.patch, 
> HIVE-21001.07.patch, HIVE-21001.08.patch, HIVE-21001.08.patch, 
> HIVE-21001.08.patch, HIVE-21001.09.patch, HIVE-21001.09.patch, 
> HIVE-21001.09.patch, HIVE-21001.10.patch, HIVE-21001.11.patch, 
> HIVE-21001.12.patch, HIVE-21001.13.patch, HIVE-21001.15.patch, 
> HIVE-21001.16.patch, HIVE-21001.17.patch, HIVE-21001.18.patch, 
> HIVE-21001.18.patch, HIVE-21001.19.patch, HIVE-21001.20.patch, 
> HIVE-21001.21.patch, HIVE-21001.22.patch, HIVE-21001.22.patch, 
> HIVE-21001.22.patch, HIVE-21001.23.patch, HIVE-21001.24.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.26.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.27.patch, 
> HIVE-21001.28.patch, HIVE-21001.29.patch, HIVE-21001.29.patch, 
> HIVE-21001.30.patch, HIVE-21001.31.patch, HIVE-21001.32.patch, 
> HIVE-21001.34.patch, HIVE-21001.35.patch, HIVE-21001.36.patch, 
> HIVE-21001.37.patch, HIVE-21001.38.patch, HIVE-21001.39.patch, 
> HIVE-21001.40.patch, HIVE-21001.41.patch, HIVE-21001.42.patch, 
> HIVE-21001.43.patch, HIVE-21001.44.patch, HIVE-21001.45.patch, 
> HIVE-21001.45.patch
>
>
> XLEAR LIBRARY CACHE 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801752#comment-16801752
 ] 

Hive QA commented on HIVE-21231:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963731/HIVE-21231.02.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16689/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16689/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16689/

Messages:
{noformat}
 This message was trimmed, see log for full details 
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-16689/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-03-26 14:01:13.298
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 80998ad HIVE-21493: BuddyAllocator - Metrics count for allocated 
arenas wrong if preallocation is done (Olli Draese via Slim Bouguerra)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 80998ad HIVE-21493: BuddyAllocator - Metrics count for allocated 
arenas wrong if preallocation is done (Olli Draese via Slim Bouguerra)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-03-26 14:01:14.589
+ rm -rf ../yetus_PreCommit-HIVE-Build-16689
+ mkdir ../yetus_PreCommit-HIVE-Build-16689
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-16689
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16689/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveCalciteUtil.java: 
does not exist in index
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinAddNotNullRule.java:
 does not exist in index
error: 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRulesRegistry.java:
 does not exist in index
error: a/ql/src/test/results/clientnegative/subquery_scalar_multi_rows.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/interval_3.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/join43.q.out: does not exist in 
index
error: a/ql/src/test/results/clientpositive/join_merging.q.out: does not exist 
in index
error: a/ql/src/test/results/clientpositive/llap/cross_prod_1.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/llap/groupby_groupingset_bug.q.out: 
does not exist in index
error: a/ql/src/test/results/clientpositive/llap/semijoin.q.out: does not exist 
in index
error: a/ql/src/test/results/clientpositive/llap/subquery_corr.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/llap/subquery_in.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/llap/subquery_notin.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/llap/subquery_scalar.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/llap/subquery_select.q.out: does 
not exist in index
error: a/ql/src/test/results/clientpositive/perf/spark/query1.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/perf/spark/query23.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/perf/spark/query24.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/perf/spark/query30.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/perf/spark/query32.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/perf/spark/query44.q.out: does not 
exist in index
error: a/ql/src/test/results/clientpositive/perf/spark/query54.q.out: does not 
exist in index
error:

[jira] [Commented] (HIVE-19034) hadoop fs test can check srcipt ok, but beeline -f report no such file

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801790#comment-16801790
 ] 

Hive QA commented on HIVE-19034:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
34s{color} | {color:blue} beeline in master has 44 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16690/dev-support/hive-personality.sh
 |
| git revision | master / 383c70c |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: beeline U: beeline |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16690/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> hadoop fs test can check srcipt ok, but beeline -f report no such file
> --
>
> Key: HIVE-19034
> URL: https://issues.apache.org/jira/browse/HIVE-19034
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
> Environment: java version: 1.8.0_112-b15
> hadoop version: 2.7.2
> hive version:1.3.0
> hive JDBS version: 1.3.0
> beeline version: 1.3.0
>Reporter: fengxianghui
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 1.3.0
>
> Attachments: HIVE-19034.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21316) Comparision of varchar column and string literal should happen in varchar

2019-03-26 Thread Zoltan Haindrich (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21316:

Attachment: HIVE-21316.04.patch

> Comparision of varchar column and string literal should happen in varchar
> -
>
> Key: HIVE-21316
> URL: https://issues.apache.org/jira/browse/HIVE-21316
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21316.01.patch, HIVE-21316.02.patch, 
> HIVE-21316.03.patch, HIVE-21316.04.patch
>
>
> this is most probably the root cause behind HIVE-21310 as well



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21001) Upgrade to calcite-1.19

2019-03-26 Thread Zoltan Haindrich (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21001:

Attachment: HIVE-21001.45.patch

> Upgrade to calcite-1.19
> ---
>
> Key: HIVE-21001
> URL: https://issues.apache.org/jira/browse/HIVE-21001
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21001.01.patch, HIVE-21001.01.patch, 
> HIVE-21001.02.patch, HIVE-21001.03.patch, HIVE-21001.04.patch, 
> HIVE-21001.05.patch, HIVE-21001.06.patch, HIVE-21001.06.patch, 
> HIVE-21001.07.patch, HIVE-21001.08.patch, HIVE-21001.08.patch, 
> HIVE-21001.08.patch, HIVE-21001.09.patch, HIVE-21001.09.patch, 
> HIVE-21001.09.patch, HIVE-21001.10.patch, HIVE-21001.11.patch, 
> HIVE-21001.12.patch, HIVE-21001.13.patch, HIVE-21001.15.patch, 
> HIVE-21001.16.patch, HIVE-21001.17.patch, HIVE-21001.18.patch, 
> HIVE-21001.18.patch, HIVE-21001.19.patch, HIVE-21001.20.patch, 
> HIVE-21001.21.patch, HIVE-21001.22.patch, HIVE-21001.22.patch, 
> HIVE-21001.22.patch, HIVE-21001.23.patch, HIVE-21001.24.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.26.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.27.patch, 
> HIVE-21001.28.patch, HIVE-21001.29.patch, HIVE-21001.29.patch, 
> HIVE-21001.30.patch, HIVE-21001.31.patch, HIVE-21001.32.patch, 
> HIVE-21001.34.patch, HIVE-21001.35.patch, HIVE-21001.36.patch, 
> HIVE-21001.37.patch, HIVE-21001.38.patch, HIVE-21001.39.patch, 
> HIVE-21001.40.patch, HIVE-21001.41.patch, HIVE-21001.42.patch, 
> HIVE-21001.43.patch, HIVE-21001.44.patch, HIVE-21001.45.patch
>
>
> XLEAR LIBRARY CACHE 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17395) HiveServer2 parsing a command with a lot of "("

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-17395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801679#comment-16801679
 ] 

Zoltan Haindrich commented on HIVE-17395:
-

[~julianhyde]: I'm not sure if it makes your problem go away or not; but 
HIVE-18624 made this issue less severe (intervals and subq both need a '(' or 
an explicit keyword - so in most cases the recursion spiral was started by 
"function")
unfortunately that patch right now is only on branch-3/master; and there is no 
Hive releases which contain it...

about antlr4it would be great but it doesn't seem possible.

> HiveServer2 parsing a command with a lot of "("
> ---
>
> Key: HIVE-17395
> URL: https://issues.apache.org/jira/browse/HIVE-17395
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 2.3.0
>Reporter: dan young
>Priority: Major
>
> Hello,
> We're seeing what appears to be the same issue that was outlined in 
> HIVE-15388 where the query parser spends a lot of time (never returns and I 
> need to kill the beeline process) parsing a command with a lot of "(" .   I 
> tried this in both 2.2 and now 2.3.
> Here's an example query (this is auto generated SQL BTW) in beeline that 
> never completes/parses, I end up just killing the beeline process.
> It looks like something similar was addressed as part of HIVE-15388.   Any 
> ideas on how to address this?  write better SQL? patch?
> Regards,
> Dano
> {noformat}
> Connected to: Apache Hive (version 2.3.0)
> Driver: Hive JDBC (version 2.3.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 2.3.0 by Apache Hive
> 0: jdbc:hive2://localhost:1/test_db> SELECT 
> ((UNIX_TIMESTAMP(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))), 
> -3),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP(CONCAT(ADD_MONTHS(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP), 
> 1),SUBSTRING(CAST(CONCAT(CAST(YEAR(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 AS STRING), '-', 
> LPAD(CAST(((CAST(CEIL(MONTH(TIMESTAMP(CONCAT(ADD_MONTHS(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20
>  00:00:00.0'), 'MM'))), 
> -1),SUBSTRING(TIMESTAMP(DATE(TRUNC(TIMESTAMP('2012-04-20 
> 00:00:00.0'), 'MM'))),11 / 3) AS INT) - 1) * 3) + 1 AS STRING), 
> 2, '0'), '-01 00:00:00') AS TIMESTAMP),11))), 'MM'))),11));
> When I did a jstack on the HiveServer2, it appears the be stuck/running in 
> the HiveParser/antlr.
> "e62658bd-5ea9-43c4-898f-3048d913f192 HiveServer2-Handler-Pool: Thread-96" 
> #96 prio=5 os_prio=0 tid=0x7fb78c366000 nid=0x4476 runnable 
> [0x7fb77d7bb000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA36.specialStateTransition(HiveParser_IdentifiersParser.java:31502)
>   at org.antlr.runtime.DFA.predict(DFA.java:80)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6746)

[jira] [Commented] (HIVE-21479) NPE during metastore cache update

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801713#comment-16801713
 ] 

Zoltan Haindrich commented on HIVE-21479:
-

+1

> NPE during metastore cache update
> -
>
> Key: HIVE-21479
> URL: https://issues.apache.org/jira/browse/HIVE-21479
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21479.1.patch
>
>
> Saw the following stack during a long periodical update:
> {code}
> 2019-03-12T10:01:43,015 ERROR [CachedStore-CacheUpdateService: Thread-36] 
> cache.CachedStore: Update failure:java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.cache.CachedStore$CacheUpdateMasterWork.updateTableColStats(CachedStore.java:508)
>   at 
> org.apache.hadoop.hive.metastore.cache.CachedStore$CacheUpdateMasterWork.update(CachedStore.java:461)
>   at 
> org.apache.hadoop.hive.metastore.cache.CachedStore$CacheUpdateMasterWork.run(CachedStore.java:396)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is we get the table list at very early stage and then refresh 
> table one by one. It is likely table is removed during the interim. We need 
> to deal with this case during cache update.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801716#comment-16801716
 ] 

Adam Szita commented on HIVE-21509:
---

I've attached a proposed solution (work in progress) that would fix this 
problem, see [^HIVE-21509.0.wip.patch].

It adds a reference counter to ColumnVectors which is increased before 
beginning to write and decreased once the CV has been passed to the actual 
writer. EncodedDataConsumer will check this counter and will not return CVs 
into the object pool whose ref count is greater than zero.

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Status: Patch Available  (was: Open)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 4.0.0
>
> Attachments: HIVE-21511.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Attachment: HIVE-21511.1.patch

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 4.0.0
>
> Attachments: HIVE-21511.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21001) Upgrade to calcite-1.19

2019-03-26 Thread Zoltan Haindrich (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21001:

Attachment: HIVE-21001.46.patch

> Upgrade to calcite-1.19
> ---
>
> Key: HIVE-21001
> URL: https://issues.apache.org/jira/browse/HIVE-21001
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: HIVE-21001.01.patch, HIVE-21001.01.patch, 
> HIVE-21001.02.patch, HIVE-21001.03.patch, HIVE-21001.04.patch, 
> HIVE-21001.05.patch, HIVE-21001.06.patch, HIVE-21001.06.patch, 
> HIVE-21001.07.patch, HIVE-21001.08.patch, HIVE-21001.08.patch, 
> HIVE-21001.08.patch, HIVE-21001.09.patch, HIVE-21001.09.patch, 
> HIVE-21001.09.patch, HIVE-21001.10.patch, HIVE-21001.11.patch, 
> HIVE-21001.12.patch, HIVE-21001.13.patch, HIVE-21001.15.patch, 
> HIVE-21001.16.patch, HIVE-21001.17.patch, HIVE-21001.18.patch, 
> HIVE-21001.18.patch, HIVE-21001.19.patch, HIVE-21001.20.patch, 
> HIVE-21001.21.patch, HIVE-21001.22.patch, HIVE-21001.22.patch, 
> HIVE-21001.22.patch, HIVE-21001.23.patch, HIVE-21001.24.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.26.patch, 
> HIVE-21001.26.patch, HIVE-21001.26.patch, HIVE-21001.27.patch, 
> HIVE-21001.28.patch, HIVE-21001.29.patch, HIVE-21001.29.patch, 
> HIVE-21001.30.patch, HIVE-21001.31.patch, HIVE-21001.32.patch, 
> HIVE-21001.34.patch, HIVE-21001.35.patch, HIVE-21001.36.patch, 
> HIVE-21001.37.patch, HIVE-21001.38.patch, HIVE-21001.39.patch, 
> HIVE-21001.40.patch, HIVE-21001.41.patch, HIVE-21001.42.patch, 
> HIVE-21001.43.patch, HIVE-21001.44.patch, HIVE-21001.45.patch, 
> HIVE-21001.45.patch, HIVE-21001.46.patch
>
>
> XLEAR LIBRARY CACHE 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19034) hadoop fs test can check srcipt ok, but beeline -f report no such file

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-19034:

Status: Patch Available  (was: Open)

> hadoop fs test can check srcipt ok, but beeline -f report no such file
> --
>
> Key: HIVE-19034
> URL: https://issues.apache.org/jira/browse/HIVE-19034
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
> Environment: java version: 1.8.0_112-b15
> hadoop version: 2.7.2
> hive version:1.3.0
> hive JDBS version: 1.3.0
> beeline version: 1.3.0
>Reporter: fengxianghui
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 1.3.0
>
> Attachments: HIVE-19034.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19034) hadoop fs test can check srcipt ok, but beeline -f report no such file

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-19034:

Attachment: HIVE-19034.1.patch

> hadoop fs test can check srcipt ok, but beeline -f report no such file
> --
>
> Key: HIVE-19034
> URL: https://issues.apache.org/jira/browse/HIVE-19034
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
> Environment: java version: 1.8.0_112-b15
> hadoop version: 2.7.2
> hive version:1.3.0
> hive JDBS version: 1.3.0
> beeline version: 1.3.0
>Reporter: fengxianghui
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 1.3.0
>
> Attachments: HIVE-19034.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Attachment: (was: HIVE-19034.1.patch)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 4.0.0
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801775#comment-16801775
 ] 

Bruno Pusztahazi commented on HIVE-21511:
-

The root cause issue of HIVE-19034 still exists on the latest versions.Tracking 
here.

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 4.0.0
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Attachment: HIVE-19034.1.patch

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 4.0.0
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Attachment: (was: HIVE-21511.1.patch)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Attachments: HIVE-21511.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Status: Open  (was: Patch Available)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Attachments: HIVE-21511.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Status: Patch Available  (was: Open)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Attachments: HIVE-21511.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19034) hadoop fs test can check srcipt ok, but beeline -f report no such file

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801782#comment-16801782
 ] 

Zoltan Haindrich commented on HIVE-19034:
-

I don't think this was working before - or it is documented somewhere that -f 
accepts hdfs paths?
note that I'm not against it; but it seems to me more like a "feature addition" 
instead of a "blocker bug"

> hadoop fs test can check srcipt ok, but beeline -f report no such file
> --
>
> Key: HIVE-19034
> URL: https://issues.apache.org/jira/browse/HIVE-19034
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
> Environment: java version: 1.8.0_112-b15
> hadoop version: 2.7.2
> hive version:1.3.0
> hive JDBS version: 1.3.0
> beeline version: 1.3.0
>Reporter: fengxianghui
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 1.3.0
>
> Attachments: HIVE-19034.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Attachment: HIVE-21511.1.patch

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Attachments: HIVE-21511.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21048) Remove needless org.mortbay.jetty from hadoop exclusions

2019-03-26 Thread Zoltan Haindrich (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21048:

   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

pushed to master. Thank you [~abstractdog]!

> Remove needless org.mortbay.jetty from hadoop exclusions
> 
>
> Key: HIVE-21048
> URL: https://issues.apache.org/jira/browse/HIVE-21048
> Project: Hive
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21048.01.patch, HIVE-21048.02.patch, 
> HIVE-21048.03.patch, HIVE-21048.04.patch, HIVE-21048.05.patch, 
> HIVE-21048.06.patch, HIVE-21048.07.patch, HIVE-21048.08.patch, 
> HIVE-21048.08.patch, HIVE-21048.09.patch, HIVE-21048.10.patch, 
> HIVE-21048.11.patch, dep.out
>
>
> During HIVE-20638 i found that org.mortbay.jetty exclusions from e.g. hadoop 
> don't take effect, as the actual groupId of jetty is org.eclipse.jetty for 
> most of the current projects, please find attachment (example for hive 
> commons project).
> https://en.wikipedia.org/wiki/Jetty_(web_server)#History



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21423) Do not check for whitespace issues in generated code

2019-03-26 Thread Zoltan Haindrich (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21423:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to master. Thank you [~mgergely]!

> Do not check for whitespace issues in generated code
> 
>
> Key: HIVE-21423
> URL: https://issues.apache.org/jira/browse/HIVE-21423
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21423.01.patch, HIVE-21423.02.patch, 
> HIVE-21423.03.patch, HIVE-21423.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-15406) Consider vectorizing the new 'trunc' function

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801761#comment-16801761
 ] 

Zoltan Haindrich commented on HIVE-15406:
-

thank you for the explanation [~abstractdog]
+1

> Consider vectorizing the new 'trunc' function
> -
>
> Key: HIVE-15406
> URL: https://issues.apache.org/jira/browse/HIVE-15406
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 4.0.0
>Reporter: Matt McCline
>Assignee: Laszlo Bodor
>Priority: Critical
> Attachments: HIVE-15406.01.patch, HIVE-15406.02.patch, 
> HIVE-15406.03.patch, HIVE-15406.04.patch, HIVE-15406.05.patch, 
> HIVE-15406.06.patch
>
>
> Rounding function 'trunc' added by HIVE-14582.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Affects Version/s: (was: beeline-cli-branch)
   (was: 1.3.0)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Attachments: HIVE-21511.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Target Version/s:   (was: 4.0.0)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 4.0.0
>
> Attachments: HIVE-21511.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Fix Version/s: (was: 4.0.0)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Attachments: HIVE-21511.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19034) hadoop fs test can check srcipt ok, but beeline -f report no such file

2019-03-26 Thread Zoltan Haindrich (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-19034:

Labels: patch todoc4.0  (was: patch)

> hadoop fs test can check srcipt ok, but beeline -f report no such file
> --
>
> Key: HIVE-19034
> URL: https://issues.apache.org/jira/browse/HIVE-19034
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
> Environment: java version: 1.8.0_112-b15
> hadoop version: 2.7.2
> hive version:1.3.0
> hive JDBS version: 1.3.0
> beeline version: 1.3.0
>Reporter: fengxianghui
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch, todoc4.0
> Fix For: 1.3.0
>
> Attachments: HIVE-19034.1.patch
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801656#comment-16801656
 ] 

Hive QA commented on HIVE-21507:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} jdbc in master has 16 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
11s{color} | {color:red} jdbc: The patch generated 27 new + 34 unchanged - 0 
fixed = 61 total (was 34) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16686/dev-support/hive-personality.sh
 |
| git revision | master / 80998ad |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16686/yetus/diff-checkstyle-jdbc.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16686/yetus/patch-asflicense-problems.txt
 |
| modules | C: jdbc U: jdbc |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16686/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507.001.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21507:
--
Attachment: HIVE-21507.002.patch

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507.001.patch, HIVE-21507.002.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801734#comment-16801734
 ] 

Zoltan Haindrich commented on HIVE-21509:
-

{code}
@@ -258,5 +279,6 @@ public void shallowCopyTo(ColumnVector otherCv) {
 otherCv.isRepeating = isRepeating;
 otherCv.preFlattenIsRepeating = preFlattenIsRepeating;
 otherCv.preFlattenNoNulls = preFlattenNoNulls;
+otherCv.refCount.set(refCount.get());
{code}
this seems to "shallowcopy" the "refCount" as well - I feel that something is 
not right with this...I would instead expect to that "other" should refer to 
the same reference counter; or should retain it's own counter...


> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801741#comment-16801741
 ] 

Hive QA commented on HIVE-21507:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963724/HIVE-21507.002.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16688/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16688/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16688/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-03-26 13:53:33.229
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-16688/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-03-26 13:53:33.232
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 80998ad HIVE-21493: BuddyAllocator - Metrics count for allocated 
arenas wrong if preallocation is done (Olli Draese via Slim Bouguerra)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 80998ad HIVE-21493: BuddyAllocator - Metrics count for allocated 
arenas wrong if preallocation is done (Olli Draese via Slim Bouguerra)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-03-26 13:53:34.215
+ rm -rf ../yetus_PreCommit-HIVE-Build-16688
+ mkdir ../yetus_PreCommit-HIVE-Build-16688
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-16688
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16688/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java: does not exist 
in index
Going to apply patch with: git apply -p1
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
protoc-jar: executing: [/tmp/protoc4958219599846059871.exe, --version]
libprotoc 2.5.0
protoc-jar: executing: [/tmp/protoc4958219599846059871.exe, 
-I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore,
 
--java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources,
 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
ANTLR Parser Generator  Version 3.5.2
[ERROR] Failed to execute goal on project hive-shims-0.23: Could not resolve 
dependencies for project 
org.apache.hive.shims:hive-shims-0.23:jar:4.0.0-SNAPSHOT: The following 
artifacts could not be resolved: 
org.eclipse.jetty:jetty-server:jar:9.3.25.v20180904, 
org.eclipse.jetty:jetty-http:jar:9.3.25.v20180904, 
org.eclipse.jetty:jetty-io:jar:9.3.25.v20180904, 
org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:3.1.0, 
org.apache.hadoop:hadoop-yarn-server-common:jar:3.1.0, 
org.apache.hadoop:hadoop-yarn-registry:jar:3.1.0, dnsjava:dnsjava:jar:2.1.7, 
org.apache.geronimo.specs:geronimo-jcache_1.0_spec:jar:1.0-alpha-1, 
org.ehcache:ehcache:jar:3.3.1, com.zaxxer:HikariCP-java7:jar:2.4.12, 
com.microsoft.sqlserver:mssql-jdbc:jar:6.2.1.jre7, 
org.apache.hadoop:hadoop-yarn-server-applicationhistoryservice:jar:3.1.0, 
de.ruedigermoeller:fst:jar:2.50, com.cedarsoftware:java-util:jar:1.9.0, 
com.cedarsoftware:json-io:jar:2.5.1, 
org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:3.1.0,

[jira] [Commented] (HIVE-16255) Support percentile_cont / percentile_disc

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-16255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801750#comment-16801750
 ] 

Zoltan Haindrich commented on HIVE-16255:
-

+1

> Support percentile_cont / percentile_disc
> -
>
> Key: HIVE-16255
> URL: https://issues.apache.org/jira/browse/HIVE-16255
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Carter Shanklin
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-16255.01.patch, HIVE-16255.02.patch, 
> HIVE-16255.03.patch, HIVE-16255.04.patch, HIVE-16255.05.patch, 
> HIVE-16255.06.patch
>
>
> Way back in HIVE-259, a percentile function was added that provides a subset 
> of the standard percentile_cont aggregate function.
> The SQL standard provides some additional options and also a percentile_disc 
> aggregate function with different rules. In the standard you specify an 
> ordering with arbitrary value expression and the results are drawn from this 
> value expression. This aggregate functions should be usable as analytic 
> functions as well (i.e. support the over clause). The current percentile 
> function is able to be used with an over clause.
> The rough outline of how this works is:
> percentile_cont(number) within group (order by expression) [ over(window 
> spec) ]
> percentile_disc(number) within group (order by expression) [ over(window 
> spec) ]
> The value of number should be between 0 and 1. The value expression is 
> evaluated for each row of the group, nulls are discarded, and the remaining 
> rows are ordered.
> — If PERCENTILE_CONT is specified, by considering the pair of consecutive 
> rows that are indicated by the argument, treated as a fraction of the total 
> number of rows in the group, and interpolating the value of the value 
> expression evaluated for these rows.
> — If PERCENTILE_DISC is specified, by treating the group as a window 
> partition of the CUME_DIST window function, using the specified ordering of 
> the value expression as the window ordering, and returning the  first value 
> expression whose cumulative distribution value is greater than or equal to 
> the argument.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi reassigned HIVE-21511:
---


> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
> Environment: java version: 1.8.0_112-b15
> hadoop version: 2.7.2
> hive version:1.3.0
> hive JDBS version: 1.3.0
> beeline version: 1.3.0
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 1.3.0
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-21509:
--
Description: 
In some scenarios, LLAP might store column vectors in cache that are getting 
reused and reset just before their original content would be written.

The issue is a concurrency issue and is thereby flaky. It is not easy to 
reproduce, but the odds of surfacing this issue can by improved by setting LLAP 
executor and IO thread counts this way:
 * set hive.llap.daemon.num.executors=32;
 * set hive.llap.io.threadpool.size=1;
 * using TPCDS input data of store_sales table, have at least a couple of 
100k's of rows, and use text format:

{code:java}
ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  WITH 
SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  STORED 
AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  OUTPUTFORMAT    
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
 * having more splits increases the issue showing itself, so it is worth to 
_set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
 * run query on this this table: select min(ss_sold_date_sk) from store_sales;

The first query result is correct (2450816 in my case). Repeating the query 
will trigger reading from LLAP cache and produce a wrong result: 0.

If one wants to make sure of running into this issue, place a Thread.sleep(250) 
at the beginning of VectorDeserializeOrcWriter#run().

 

  was:
In some scenarios, LLAP might store column vectors in cache that are getting 
reused and reset just before their original content would be written.

The issue is a concurrency issue and is thereby flaky. It is not easy to 
reproduce, but the odds of surfacing this issue can by improved by setting LLAP 
executor and IO thread counts this way:
 * set hive.llap.daemon.num.executors=32;
 * set hive.llap.io.threadpool.size=1;
 * using TPCDS input data of store_sales table, which is in text format:

{code:java}
ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  WITH 
SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  STORED 
AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  OUTPUTFORMAT    
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}

 * run query on this this table: select min(ss_sold_date_sk) from store_sales;

The first query result is correct (2450816 in my case). Repeating the query 
will trigger reading from LLAP cache and produce a wrong result: 0.

If one wants to make sure of running into this issue, place a Thread.sleep(250) 
at the beginning of VectorDeserializeOrcWriter#run().

 


> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21510) Vectorization: add support for and/or for (constant,column) cases

2019-03-26 Thread Laszlo Bodor (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801670#comment-16801670
 ] 

Laszlo Bodor commented on HIVE-21510:
-

note, an example from 
https://issues.apache.org/jira/secure/attachment/12963723/HIVE-21001.45.patch:

vector_date_1.q.out
LongColEqualLongColumn -> VectorUDFAdaptor due to the rewrite

before patch:
{code}
selectExpressions: LongColEqualLongColumn(col 0:date, col 0:date) -> 3:boolean, 
LongColNotEqualLongColumn(col 0:date, col 1:date) -> 4:boolean, 
LongColLessEqualLongColumn(col 0:date, col 0:date) -> 5:boolean, 
LongColLessEqualLongColumn(col 0:date, col 1:date) -> 6:boolean, 
LongColLessLongColumn(col 0:date, col 1:date) -> 7:boolean, 
LongColGreaterEqualLongColumn(col 1:date, col 1:date) -> 8:boolean, 
LongColGreaterEqualLongColumn(col 1:date, col 0:date) -> 9:boolean, 
LongColGreaterLongColumn(col 1:date, col 0:date) -> 10:boolean
{code}

after patch:
{code}
selectExpressions: VectorUDFAdaptor((null or dt1 is not null))(children: 
IsNotNull(col 0:date) -> 3:boolean) -> 4:boolean, LongColNotEqualLongColumn(col 
0:date, col 1:date) -> 5:boolean, LongColLessEqualLongColumn(col 0:date, col 
1:date) -> 6:boolean, LongColLessLongColumn(col 0:date, col 1:date) -> 
7:boolean, VectorUDFAdaptor((null or dt2 is not null))(children: IsNotNull(col 
1:date) -> 8:boolean) -> 9:boolean, LongColGreaterEqualLongColumn(col 1:date, 
col 0:date) -> 10:boolean, LongColGreaterLongColumn(col 1:date, col 0:date) -> 
11:boolean
{code}


> Vectorization: add support for and/or for (constant,column) cases
> -
>
> Key: HIVE-21510
> URL: https://issues.apache.org/jira/browse/HIVE-21510
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Laszlo Bodor
>Priority: Major
>
> After HIVE-21001 some selectExpressions will start using VectorUDFAdaptor for 
> "null and x" expressions. Because right now there are 2-3 places from which 
> rewrite will be done to the form of "null and/or x" form; it would be better 
> to support it.
> {code}
> [...]
> selectExpressions: VectorUDFAdaptor((null and dt1 is null))
> [...]
> usesVectorUDFAdaptor: true
> [...]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801669#comment-16801669
 ] 

Zoltan Haindrich commented on HIVE-21507:
-

+1 pending tests

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507.001.patch, HIVE-21507.002.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-21509:
--
Attachment: HIVE-21509.0.wip.patch

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801735#comment-16801735
 ] 

Hive QA commented on HIVE-21507:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963720/HIVE-21507.001.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15842 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16686/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16686/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16686/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963720 - PreCommit-HIVE-Build

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507.001.patch, HIVE-21507.002.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Target Version/s: 4.0.0  (was: 1.3.0)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 1.3.0
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Environment: (was: java version: 1.8.0_112-b15

hadoop version: 2.7.2

hive version:1.3.0

hive JDBS version: 1.3.0

beeline version: 1.3.0)

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 1.3.0
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21511) beeline -f report no such file if file is not on local fs

2019-03-26 Thread Bruno Pusztahazi (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Pusztahazi updated HIVE-21511:

Fix Version/s: (was: 1.3.0)
   4.0.0

> beeline -f report no such file if file is not on local fs
> -
>
> Key: HIVE-21511
> URL: https://issues.apache.org/jira/browse/HIVE-21511
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.3.0, beeline-cli-branch
>Reporter: Bruno Pusztahazi
>Assignee: Bruno Pusztahazi
>Priority: Blocker
>  Labels: patch
> Fix For: 4.0.0
>
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> I test like this
> HQL=hdfs://hacluster/tmp/ff.hql
> if hadoop fs -test -f ${HQL}
> then
>    beeline -f ${HQL}
> fi
> test ${HQL} ok, but beeline report ${HQL} no such file or directory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21507:
--
Attachment: HIVE-21507.003.patch

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507.001.patch, HIVE-21507.002.patch, 
> HIVE-21507.003.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218857
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269103325
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java
 ##
 @@ -199,12 +199,15 @@ private AddPartitionDesc partitionDesc(Path fromPath,
   // Right now, we do not have a way of associating a writeId with 
statistics for a table
   // converted to a transactional table if it was non-transactional on the 
source. So, do not
 
 Review comment:
   Comment needs to be corrected.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218857)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218859
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269154738
 
 

 ##
 File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnCommonUtils.java
 ##
 @@ -84,6 +86,73 @@ public static ValidTxnList 
createValidReadTxnList(GetOpenTxnsResponse txns, long
 return new ValidReadTxnList(exceptions, outAbortedBits, highWaterMark, 
minOpenTxnId);
   }
 
+  /**
+   * Transform a {@link 
org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse} to a
+   * {@link org.apache.hadoop.hive.common.ValidTxnList}.  This assumes that 
the caller intends to
+   * read the files, and thus treats both open and aborted transactions as 
invalid.
+   *
+   * This API is used by Hive replication which may have multiple transactions 
open at a time.
+   *
+   * @param txns open txn list from the metastore
+   * @param currentTxns Current transactions that the replication has opened.  
If any of the
+   *transactions is greater than 0 it will be removed from 
the exceptions
+   *list so that the replication sees its own transaction 
as valid.
+   * @return a valid txn list.
+   */
+  public static ValidTxnList createValidReadTxnList(GetOpenTxnsResponse txns,
 
 Review comment:
   The complete logic of considering all txns opened in a batch by open txn 
event as current txns is incorrect. 
   Multiple txns are opened by repl task only for replicating Hive Streaming 
case where we allocate txns batch but use one at a time. Also, we don't update 
stats in that case. Even if we update stats, it should refer to one txn as 
current txn and rest of the txns are left open. 
   Shall remove replTxnIds cache in TxnManager as well. All callers shall 
create a hardcoded ValidWriteIdList using the writeId received from event msg.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218859)
Time Spent: 1h 20m  (was: 1h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218862
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269172695
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -3539,10 +3573,19 @@ public boolean equals(Object obj) {
 }
 
 // Update partition column statistics if available
-for (Partition newPart : newParts) {
-  if (newPart.isSetColStats()) {
-updatePartitonColStatsInternal(tbl, newPart.getColStats(), null, 
newPart.getWriteId());
+int cnt = 0;
+for (ColumnStatistics partColStats: partsColStats) {
+  long writeId = partsWriteIds.get(cnt++);
+  // On replica craft a valid snapshot out of the writeId in the 
partition
+  String validWriteIds = null;
+  if (writeId > 0) {
+ValidWriteIdList vwil =
 
 Review comment:
   Same as above.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218862)
Time Spent: 1h 40m  (was: 1.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218853=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218853
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269098036
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
+isTxn = false;
+  } else {
+isTxn = true;
+  }
+}
+// TODO: Somehow we have to signal alterPartitions that it's part of 
replication and
+//  should use replication's valid writeid list instead of creating 
one.
 
 Review comment:
   What do you mean by replication's valid writeid list in this comment? Even 
in repl flow, we get validWriteIdList from HMS based on incoming writeId in the 
event msg. Are you suggesting to cache this ValidWriteIdList somewhere and use 
it instead of invoking HMS API?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218853)
Time Spent: 0.5h  (was: 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218864
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269223302
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
 
 Review comment:
   addPartitionDesc.getReplicationSpec() will never be null. Can remove this 
check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218864)
Time Spent: 2h  (was: 1h 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218852
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269081532
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
 ##
 @@ -2689,7 +2689,19 @@ private int alterTable(Hive db, AlterTableDesc 
alterTbl) throws HiveException {
   } else {
 // Note: this is necessary for UPDATE_STATISTICS command, that 
operates via ADDPROPS (why?).
 //   For any other updates, we don't want to do txn check on 
partitions when altering table.
-boolean isTxn = alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS;
+boolean isTxn = false;
+if (alterTbl.getPartSpec() != null && alterTbl.getOp() == 
AlterTableTypes.ADDPROPS) {
+  // ADDPROPS is used to add repl.last.id during replication. That's 
not a transactional
+  // change.
+  Map props = alterTbl.getProps();
+  if (props.size() <= 1 && 
props.get(ReplicationSpec.KEY.CURR_STATE_ID.toString()) != null) {
 
 Review comment:
   ReplUtils.REPL_CHECKPOINT_KEY is another prop we set it in repl flow which 
is not transactional. This check doesn't seems to be clean as in future we 
might add more such alters in repl flow. Can we check 
replicationSpec.isReplicationScope instead or another flag in AlterTableDesc to 
skip this?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218852)
Time Spent: 20m  (was: 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218861
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269161871
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
 
 Review comment:
   Shall use meaningful names instead of "vwil".
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218861)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218866
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269257547
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -987,10 +989,14 @@ public void createTable(Table tbl, boolean ifNotExists,
   tTbl.setPrivileges(principalPrivs);
 }
   }
-  // Set table snapshot to api.Table to make it persistent.
-  TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, tbl, 
true);
-  if (tableSnapshot != null) {
-tbl.getTTable().setWriteId(tableSnapshot.getWriteId());
+  // Set table snapshot to api.Table to make it persistent. A 
transactional table being
+  // replicated may have a valid write Id copied from the source. Use that 
instead of
+  // crafting one on the replica.
+  if (tTbl.getWriteId() <= 0) {
 
 Review comment:
   DO_NOT_UPDATE_STATS flag should be set in createTableFlow as well. Or else 
in autogather mode at target, it will be updated automatically.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218866)
Time Spent: 2h 20m  (was: 2h 10m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218855
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269156935
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -1894,6 +1898,16 @@ private void create_table_core(final RawStore ms, final 
Table tbl,
List checkConstraints)
 throws AlreadyExistsException, MetaException,
 InvalidObjectException, NoSuchObjectException, InvalidInputException {
+
+  ColumnStatistics colStats = null;
+  // If the given table has column statistics, save it here. We will 
update it later.
+  // We don't want it to be part of the Table object being created, lest 
the create table
 
 Review comment:
   Shall simplify the comment. "Column stats are not expected to be part of 
Create table event and also shouldn't be persisted. So remove it from Table 
object."
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218855)
Time Spent: 50m  (was: 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218867
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269247183
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestStatsReplicationScenarios.java
 ##
 @@ -359,17 +383,20 @@ private void testStatsReplicationCommon(boolean 
parallelBootstrap, boolean metad
   }
 
   @Test
-  public void testForNonAcidTables() throws Throwable {
+  public void testNonParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
 testStatsReplicationCommon(false, false);
   }
 
   @Test
-  public void testForNonAcidTablesParallelBootstrapLoad() throws Throwable {
-testStatsReplicationCommon(true, false);
+  public void testForParallelBootstrapLoad() throws Throwable {
+LOG.info("Testing " + testName.getClass().getName() + "." + 
testName.getMethodName());
+testStatsReplicationCommon(true, false );
   }
 
   @Test
-  public void testNonAcidMetadataOnlyDump() throws Throwable {
+  public void testMetadataOnlyDump() throws Throwable {
 
 Review comment:
   Add more tests for the following scenarios.
   1. REPL LOAD fails after replicating table or partition objects with stats 
but before setting last replId. Now, retry which takes alter table/partition 
replace flows and stats should be valid after successful replication. Need this 
for all non-transactional, transactional and migration cases.
   2. Parallel inserts with autogather enabled. Now, we will have events such 
that multiple txns open when updating stats event. Also, try to simulate that 
one stats update was successful and the other one invalidates it due to 
concurrent writes. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218867)
Time Spent: 2.5h  (was: 2h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218865
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269262756
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ImportTableDesc.java
 ##
 @@ -381,4 +382,11 @@ public void setOwnerName(String ownerName) {
 throw new RuntimeException("Invalid table type : " + getDescType());
 }
   }
+
+  public Long getReplWriteId() {
+if (this.createTblDesc != null) {
+  return this.createTblDesc.getReplWriteId();
 
 Review comment:
   This replWriteId is just a place holder for the writeId from the event 
message. It need not be in CreateTableDesc. Can be maintained in local 
variables and pass around.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218865)
Time Spent: 2h 10m  (was: 2h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218860
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269220469
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -2950,21 +2956,33 @@ public Partition createPartition(Table tbl, 
Map partSpec) throws
 int size = addPartitionDesc.getPartitionCount();
 List in =
 new ArrayList(size);
-AcidUtils.TableSnapshot tableSnapshot = AcidUtils.getTableSnapshot(conf, 
tbl, true);
 long writeId;
 String validWriteIdList;
-if (tableSnapshot != null && tableSnapshot.getWriteId() > 0) {
-  writeId = tableSnapshot.getWriteId();
-  validWriteIdList = tableSnapshot.getValidWriteIdList();
+
+// In case of replication, get the writeId from the source and use valid 
write Id list
+// for replication.
+if (addPartitionDesc.getReplicationSpec() != null &&
+addPartitionDesc.getReplicationSpec().isInReplicationScope() &&
+addPartitionDesc.getPartition(0).getWriteId() > 0) {
+  writeId = addPartitionDesc.getPartition(0).getWriteId();
+  validWriteIdList =
 
 Review comment:
   In replication flow, it is fine to use hardcoded ValidWriteIdList as we want 
to forcefully set this writeId into table or partition objects. Getting it from 
current state might be wrong as we don't update ValidTxnList in conf for repl 
created txns. 
   ValidWriteIdList is just used to check if writeId in metastore objects are 
updated by any concurrent inserts. In repl load flow, it is not possible as we 
replicate one event at a time or in bootstrap, no 2 threads writes into same 
table.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218860)
Time Spent: 1.5h  (was: 1h 20m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218863
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269169210
 
 

 ##
 File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##
 @@ -2130,11 +2144,18 @@ private void create_table_core(final RawStore ms, 
final Table tbl,
 
   // If the table has column statistics, update it into the metastore. 
This feature is used
   // by replication to replicate table level statistics.
-  if (tbl.isSetColStats()) {
-// We do not replicate statistics for a transactional table right now 
and hence we do not
-// expect a transactional table to have column statistics here. So 
passing null
-// validWriteIds is fine for now.
-updateTableColumnStatsInternal(tbl.getColStats(), null, 
tbl.getWriteId());
+  if (colStats != null) {
+// On replica craft a valid snapshot out of the writeId in the table.
+long writeId = tbl.getWriteId();
+String validWriteIds = null;
+if (writeId > 0) {
+  ValidWriteIdList vwil =
+  new 
ValidReaderWriteIdList(TableName.getDbTable(tbl.getDbName(),
 
 Review comment:
   Shall add a comment on why the hardcoded validWriteList is used in this flow 
instead of taking current state of txns.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218863)
Time Spent: 1h 50m  (was: 1h 40m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218856
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269110947
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
 ##
 @@ -828,6 +828,8 @@ public void alterPartitions(String tblName, 
List newParts,
   new ArrayList();
 try {
   AcidUtils.TableSnapshot tableSnapshot = null;
+  // TODO: In case of replication use the writeId and valid write id list 
constructed for
 
 Review comment:
   Is it done or still TODO?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218856)
Time Spent: 1h  (was: 50m)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218854
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269060256
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/ddl/table/CreateTableDesc.java
 ##
 @@ -118,7 +118,8 @@
   List notNullConstraints;
   List defaultConstraints;
   List checkConstraints;
-  private ColumnStatistics colStats;
+  private ColumnStatistics colStats;  // For the sake of replication
+  private long writeId = -1; // For the sake of replication
 
 Review comment:
   Can we re-use the replWriteId variable that we already have?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218854)
Time Spent: 40m  (was: 0.5h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21109:
--
Labels: pull-request-available  (was: )

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.

2019-03-26 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=218858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-218858
 ]

ASF GitHub Bot logged work on HIVE-21109:
-

Author: ASF GitHub Bot
Created on: 26/Mar/19 18:58
Start Date: 26/Mar/19 18:58
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #579: HIVE-21109 : 
Support stats replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r269136269
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
 ##
 @@ -1247,17 +1244,37 @@ private static void createReplImportTasks(
   } else if (!replicationSpec.isMetadataOnly()
   && !shouldSkipDataCopyInReplScope(tblDesc, replicationSpec)) {
 x.getLOG().debug("adding dependent CopyWork/MoveWork for table");
-t.addDependentTask(loadTable(fromURI, table, 
replicationSpec.isReplace(),
-new Path(tblDesc.getLocation()), replicationSpec, x, writeId, 
stmtId));
+dependentTasks = new ArrayList<>(1);
+dependentTasks.add(loadTable(fromURI, table, 
replicationSpec.isReplace(),
+  new Path(tblDesc.getLocation()), 
replicationSpec,
+  x, writeId, stmtId));
   }
 
-  if (dropTblTask != null) {
-// Drop first and then create
-dropTblTask.addDependentTask(t);
-x.getTasks().add(dropTblTask);
+  // During replication, by the time we reply a commit transaction event, 
the table should
+  // have been already created when replaying previous events. So no need 
to create table
+  // again. For some reason we need create table task for partitioned 
table though.
 
 Review comment:
   The comment says for partitioned table, create table task needed but in the 
code it is skipped always for commit txn event. Which one is correct?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 218858)
Time Spent: 1h 10m  (was: 1h)

> Stats replication for ACID tables.
> --
>
> Key: HIVE-21109
> URL: https://issues.apache.org/jira/browse/HIVE-21109
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ashutosh Bapat
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, 
> HIVE-21109.03.patch, HIVE-21109.04.patch, HIVE-21109.05.patch, 
> HIVE-21109.06.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This 
> writeId needs to be in sync with the writeId on the source and hence needs to 
> be replicated from the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21507) Hive swallows NPE if no delegation token found

2019-03-26 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802096#comment-16802096
 ] 

Hive QA commented on HIVE-21507:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12963744/HIVE-21507.003.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15842 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16693/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16693/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16693/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12963744 - PreCommit-HIVE-Build

> Hive swallows NPE if no delegation token found
> --
>
> Key: HIVE-21507
> URL: https://issues.apache.org/jira/browse/HIVE-21507
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.1.1
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
>  Labels: usability
> Attachments: HIVE-21507.001.patch, HIVE-21507.002.patch, 
> HIVE-21507.003.patch
>
>
> In case if there is no delegation token put into token file, this 
> [line|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L777]
>  will cause a NullPointerException which is not handled and the user is not 
> notified in any way.
> To cause NPE the use case is to have an Oozie Sqoop import to Hive in a 
> kerberized cluster. Oozie puts the delegation token into the token file with 
> id: *HIVE_DELEGATION_TOKEN_hiveserver2ClientToken*. So with id *hive* it is 
> not working. However, fallback code uses the key which Oozie provides 
> [this|https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L784]
>  way.
> I suggest to have warning message to user that key with id *hive* cannot be 
> used and falling back to get delegation token from the session.
> I am creating the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21455) Too verbose logging in AvroGenericRecordReader

2019-03-26 Thread Simon poortman (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802014#comment-16802014
 ] 

Simon poortman commented on HIVE-21455:
---

Pleas

> Too verbose logging in AvroGenericRecordReader
> --
>
> Key: HIVE-21455
> URL: https://issues.apache.org/jira/browse/HIVE-21455
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 1.2.0, 1.1.0, 2.0.0, 2.1.0, 3.0.0, 3.1.0
>Reporter: Miklos Szurap
>Assignee: Simon poortman
>Priority: Minor
> Attachments: HIVE-21455.2.patch, HIVE-21455.patch
>
>
> {{AvroGenericRecordReader}} logs the Avro schema for each datafile. It is too 
> verbose, likely we don't need to log that on INFO level.
> For example a table:
> {noformat}
> create table avro_tbl (c1 string, c2 int, c3 float) stored as avro;
> {noformat}
> and querying it with a select star - with 3 datafiles HiveServer2 logs the 
> following:
> {noformat}
> 2019-03-15 09:18:35,999 INFO  org.apache.hadoop.mapred.FileInputFormat: 
> [HiveServer2-Handler-Pool: Thread-64]: Total input paths to process : 3
> 2019-03-15 09:18:35,999 INFO  
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader: 
> [HiveServer2-Handler-Pool: Thread-64]: Found the avro schema in the job: 
> {"type":"record","name":"avro_tbl","namespace":"test","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","int"],"default":null},{"name":"c3","type":["null","float"],"default":null}]}
> 2019-03-15 09:18:36,004 INFO  
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader: 
> [HiveServer2-Handler-Pool: Thread-64]: Found the avro schema in the job: 
> {"type":"record","name":"avro_tbl","namespace":"test","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","int"],"default":null},{"name":"c3","type":["null","float"],"default":null}]}
> 2019-03-15 09:18:36,010 INFO  
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader: 
> [HiveServer2-Handler-Pool: Thread-64]: Found the avro schema in the job: 
> {"type":"record","name":"avro_tbl","namespace":"test","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","int"],"default":null},{"name":"c3","type":["null","float"],"default":null}]}
> {noformat}
> This has a huge performance and storage penalty on a table with big schema 
> and thousands of datafiles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 189 matches

Mail list logo