date:20190404

[jira] [Commented] (HIVE-21573) Binary transport shall ignore principal if auth is set to delegationToken

2019-04-04 Thread Denes Bodo (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810536#comment-16810536
 ] 

Denes Bodo commented on HIVE-21573:
---

Thanks [~daijy] for noticing. Indeed, first I did not manage to swap the 
if/else blocks properly so it resulted not the same functionality. Please check 
the fixed .4 version. Thanks.

> Binary transport shall ignore principal if auth is set to delegationToken
> -
>
> Key: HIVE-21573
> URL: https://issues.apache.org/jira/browse/HIVE-21573
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
> Attachments: HIVE-21573.3.patch, HIVE-21573.4.patch
>
>
> When Beeline is used by Sqoop from Oozie sqoop action in a kerberized 
> cluster, Sqoop passes Hive delegation token to Beeline when invokes the 
> *beeline* command. Unfortunately, Beeline puts principal=XY parameter to JDBC 
> url so when binary transport is needed it will use principal based 
> authentication instead of token based.
> Related code: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L688L705]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21573) Binary transport shall ignore principal if auth is set to delegationToken

2019-04-04 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21573:
--
Attachment: HIVE-21573.4.patch

> Binary transport shall ignore principal if auth is set to delegationToken
> -
>
> Key: HIVE-21573
> URL: https://issues.apache.org/jira/browse/HIVE-21573
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
> Attachments: HIVE-21573.3.patch, HIVE-21573.4.patch
>
>
> When Beeline is used by Sqoop from Oozie sqoop action in a kerberized 
> cluster, Sqoop passes Hive delegation token to Beeline when invokes the 
> *beeline* command. Unfortunately, Beeline puts principal=XY parameter to JDBC 
> url so when binary transport is needed it will use principal based 
> authentication instead of token based.
> Related code: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L688L705]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20822) Improvements to push computation to JDBC from Calcite

2019-04-04 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810509#comment-16810509
 ] 

Hive QA commented on HIVE-20822:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12964895/HIVE-20822.07.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 15894 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join34] (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join35] (batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcr] (batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[rand_partitionpruner3] 
(batchId=87)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[external_jdbc_table4]
 (batchId=177)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[external_jdbc_table_perf]
 (batchId=182)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sharedwork] 
(batchId=179)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join34] 
(batchId=143)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join35] 
(batchId=141)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[pcr] (batchId=139)
org.apache.hive.spark.client.rpc.TestRpc.testClientTimeout (batchId=331)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16854/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16854/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16854/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12964895 - PreCommit-HIVE-Build

> Improvements to push computation to JDBC from Calcite
> -
>
> Key: HIVE-20822
> URL: https://issues.apache.org/jira/browse/HIVE-20822
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20822.01.patch, HIVE-20822.02.patch, 
> HIVE-20822.02.patch, HIVE-20822.03.patch, HIVE-20822.04.patch, 
> HIVE-20822.05.patch, HIVE-20822.06.patch, HIVE-20822.07.patch, 
> HIVE-20822.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21583) KillTriggerActionHandler should use "hive" credential

2019-04-04 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21583:
--
Description: Currently SessionState.username is set to null, which is 
invalid as KillQueryImplementation will validate the user privilege.  (was: 
Currently SessionState.username is set to null, which is invalid as 
KillQueryImplementation will valid the user privilege.)

> KillTriggerActionHandler should use "hive" credential
> -
>
> Key: HIVE-21583
> URL: https://issues.apache.org/jira/browse/HIVE-21583
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21583.1.patch, HIVE-21583.2.patch
>
>
> Currently SessionState.username is set to null, which is invalid as 
> KillQueryImplementation will validate the user privilege.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21583) KillTriggerActionHandler should use "hive" credential

2019-04-04 Thread Daniel Dai (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810500#comment-16810500
 ] 

Daniel Dai commented on HIVE-21583:
---

TestTriggersWorkloadManager is currently disabled. It will test the patch once 
enabled (HIVE-20075).

[~prasanth_j], can you review?

> KillTriggerActionHandler should use "hive" credential
> -
>
> Key: HIVE-21583
> URL: https://issues.apache.org/jira/browse/HIVE-21583
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21583.1.patch, HIVE-21583.2.patch
>
>
> Currently SessionState.username is set to null, which is invalid as 
> KillQueryImplementation will valid the user privilege.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21231:
--
Status: Patch Available  (was: Open)

> HiveJoinAddNotNullRule support for range predicates
> ---
>
> Key: HIVE-21231
> URL: https://issues.apache.org/jira/browse/HIVE-21231
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-21231.01.patch, HIVE-21231.02.patch, 
> HIVE-21231.03.patch, HIVE-21231.04.patch, HIVE-21231.05.patch, 
> HIVE-21231.06.patch, HIVE-21231.07.patch, HIVE-21231.08.patch, 
> HIVE-21231.09.patch, HIVE-21231.10.patch, HIVE-21231.11.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For instance, given the following query:
> {code:sql}
> SELECT t0.col0, t0.col1
> FROM
>   (
> SELECT col0, col1 FROM tab
>   ) AS t0
>   INNER JOIN
>   (
> SELECT col0, col1 FROM tab
>   ) AS t1
> ON t0.col0 < t1.col0 AND t0.col1 > t1.col1
> {code}
> we could still infer that col0 and col1 cannot be null for any of the inputs. 
> Currently we do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21567) Break up DDLTask - extract Function related operations

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21567:
--
Attachment: HIVE-21567.03.patch

> Break up DDLTask - extract Function related operations
> --
>
> Key: HIVE-21567
> URL: https://issues.apache.org/jira/browse/HIVE-21567
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21567.01.patch, HIVE-21567.02.patch, 
> HIVE-21567.03.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #4: extract all the function related operations from the old DDLTask, 
> and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21231:
--
Status: Open  (was: Patch Available)

> HiveJoinAddNotNullRule support for range predicates
> ---
>
> Key: HIVE-21231
> URL: https://issues.apache.org/jira/browse/HIVE-21231
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-21231.01.patch, HIVE-21231.02.patch, 
> HIVE-21231.03.patch, HIVE-21231.04.patch, HIVE-21231.05.patch, 
> HIVE-21231.06.patch, HIVE-21231.07.patch, HIVE-21231.08.patch, 
> HIVE-21231.09.patch, HIVE-21231.10.patch, HIVE-21231.11.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For instance, given the following query:
> {code:sql}
> SELECT t0.col0, t0.col1
> FROM
>   (
> SELECT col0, col1 FROM tab
>   ) AS t0
>   INNER JOIN
>   (
> SELECT col0, col1 FROM tab
>   ) AS t1
> ON t0.col0 < t1.col0 AND t0.col1 > t1.col1
> {code}
> we could still infer that col0 and col1 cannot be null for any of the inputs. 
> Currently we do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21231:
--
Attachment: HIVE-21231.11.patch

> HiveJoinAddNotNullRule support for range predicates
> ---
>
> Key: HIVE-21231
> URL: https://issues.apache.org/jira/browse/HIVE-21231
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-21231.01.patch, HIVE-21231.02.patch, 
> HIVE-21231.03.patch, HIVE-21231.04.patch, HIVE-21231.05.patch, 
> HIVE-21231.06.patch, HIVE-21231.07.patch, HIVE-21231.08.patch, 
> HIVE-21231.09.patch, HIVE-21231.10.patch, HIVE-21231.11.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For instance, given the following query:
> {code:sql}
> SELECT t0.col0, t0.col1
> FROM
>   (
> SELECT col0, col1 FROM tab
>   ) AS t0
>   INNER JOIN
>   (
> SELECT col0, col1 FROM tab
>   ) AS t1
> ON t0.col0 < t1.col0 AND t0.col1 > t1.col1
> {code}
> we could still infer that col0 and col1 cannot be null for any of the inputs. 
> Currently we do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21567) Break up DDLTask - extract Function related operations

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21567:
--
Status: Patch Available  (was: Open)

> Break up DDLTask - extract Function related operations
> --
>
> Key: HIVE-21567
> URL: https://issues.apache.org/jira/browse/HIVE-21567
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21567.01.patch, HIVE-21567.02.patch, 
> HIVE-21567.03.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #4: extract all the function related operations from the old DDLTask, 
> and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21567) Break up DDLTask - extract Function related operations

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21567:
--
Status: Open  (was: Patch Available)

> Break up DDLTask - extract Function related operations
> --
>
> Key: HIVE-21567
> URL: https://issues.apache.org/jira/browse/HIVE-21567
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21567.01.patch, HIVE-21567.02.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #4: extract all the function related operations from the old DDLTask, 
> and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21583) KillTriggerActionHandler should use "hive" credential

2019-04-04 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21583:
--
Attachment: HIVE-21583.2.patch

> KillTriggerActionHandler should use "hive" credential
> -
>
> Key: HIVE-21583
> URL: https://issues.apache.org/jira/browse/HIVE-21583
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21583.1.patch, HIVE-21583.2.patch
>
>
> Currently SessionState.username is set to null, which is invalid as 
> KillQueryImplementation will valid the user privilege.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20822) Improvements to push computation to JDBC from Calcite

2019-04-04 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810497#comment-16810497
 ] 

Hive QA commented on HIVE-20822:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
5s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
4s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
41s{color} | {color:red} ql: The patch generated 22 new + 153 unchanged - 0 
fixed = 175 total (was 153) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 481 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
7s{color} | {color:red} The patch 142 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16854/dev-support/hive-personality.sh
 |
| git revision | master / 5a95b0a |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16854/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16854/yetus/whitespace-eol.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16854/yetus/whitespace-tabs.txt
 |
| modules | C: ql itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16854/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Improvements to push computation to JDBC from Calcite
> -
>
> Key: HIVE-20822
> URL: https://issues.apache.org/jira/browse/HIVE-20822
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20822.01.patch, HIVE-20822.02.patch, 
> HIVE-20822.02.patch, HIVE-20822.03.patch, HIVE-20822.04.patch, 
> HIVE-20822.05.patch, HIVE-20822.06.patch, HIVE-20822.07.patch, 
> HIVE-20822.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20901) running compactor when there is nothing to do produces duplicate data

2019-04-04 Thread Abhishek Somani (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810472#comment-16810472
 ] 

Abhishek Somani commented on HIVE-20901:


Right. The patch I have uploaded will avoid the duplicate compaction in the 
first place.

I also think it is worth fixing this in the 3.1 branch where we do not have 
visibility ids. 

> running compactor when there is nothing to do produces duplicate data
> -
>
> Key: HIVE-20901
> URL: https://issues.apache.org/jira/browse/HIVE-20901
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Eugene Koifman
>Assignee: Abhishek Somani
>Priority: Major
> Attachments: HIVE-20901.1.patch
>
>
> suppose we run minor compaction 2 times, via alter table
> The 2nd request to compaction should have nothing to do but I don't think 
> there is a check for that.  It's visible in the context of HIVE-20823, where 
> each compactor run produces a delta with new visibility suffix so we end up 
> with something like
> {noformat}
> target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/
> ├── delete_delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delete_delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_001_
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> └── delta_002_002_
>     ├── _orc_acid_version
>     └── bucket_0{noformat}
> i.e. 2 deltas with the same write ID range
> this is bad.  Probably happens today as well but new run produces a delta 
> with the same name and clobbers the previous one, which may interfere with 
> writers
>  
> need to investigate
>  
> -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both 
> deltas as if they were distinct and it effectively duplicates data.-  There 
> is no data duplication - {{getAcidState()}} will not use 2 deltas with the 
> same {{writeid}} range
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data

2019-04-04 Thread Abhishek Somani (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Somani updated HIVE-20901:
---
Status: Patch Available  (was: Open)

> running compactor when there is nothing to do produces duplicate data
> -
>
> Key: HIVE-20901
> URL: https://issues.apache.org/jira/browse/HIVE-20901
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Eugene Koifman
>Assignee: Abhishek Somani
>Priority: Major
> Attachments: HIVE-20901.1.patch
>
>
> suppose we run minor compaction 2 times, via alter table
> The 2nd request to compaction should have nothing to do but I don't think 
> there is a check for that.  It's visible in the context of HIVE-20823, where 
> each compactor run produces a delta with new visibility suffix so we end up 
> with something like
> {noformat}
> target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/
> ├── delete_delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delete_delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_001_
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> └── delta_002_002_
>     ├── _orc_acid_version
>     └── bucket_0{noformat}
> i.e. 2 deltas with the same write ID range
> this is bad.  Probably happens today as well but new run produces a delta 
> with the same name and clobbers the previous one, which may interfere with 
> writers
>  
> need to investigate
>  
> -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both 
> deltas as if they were distinct and it effectively duplicates data.-  There 
> is no data duplication - {{getAcidState()}} will not use 2 deltas with the 
> same {{writeid}} range
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-04 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810463#comment-16810463
 ] 

Hive QA commented on HIVE-21291:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12964874/HIVE-21291.1.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 15896 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_schema_evolution_native]
 (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_ppd_non_deterministic]
 (batchId=77)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_part_update]
 (batchId=178)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_acid_part_update_llap_io]
 (batchId=181)
org.apache.hadoop.hive.metastore.TestPartitionManagement.testPartitionDiscoveryTransactionalTable
 (batchId=220)
org.apache.hadoop.hive.ql.io.avro.TestAvroGenericRecordReader.emptyFile 
(batchId=300)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16853/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16853/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16853/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12964874 - PreCommit-HIVE-Build

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-04 Thread Hive QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810462#comment-16810462
 ] 

Hive QA commented on HIVE-21291:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
56s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
57s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
46s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
7s{color} | {color:blue} ql in master has 2258 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
55s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
16s{color} | {color:red} serde: The patch generated 2 new + 131 unchanged - 0 
fixed = 133 total (was 131) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
59s{color} | {color:red} root: The patch generated 2 new + 157 unchanged - 0 
fixed = 159 total (was 157) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  7m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16853/dev-support/hive-personality.sh
 |
| git revision | master / 5a95b0a |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16853/yetus/diff-checkstyle-serde.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16853/yetus/diff-checkstyle-root.txt
 |
| modules | C: serde ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16853/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
>

[jira] [Updated] (HIVE-21568) HiveRelOptUtil.isRowFilteringPlan should skip Project

2019-04-04 Thread Jesus Camacho Rodriguez (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-21568:
---
Attachment: HIVE-21568.patch

> HiveRelOptUtil.isRowFilteringPlan should skip Project
> -
>
> Key: HIVE-21568
> URL: https://issues.apache.org/jira/browse/HIVE-21568
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21568.patch, HIVE-21568.patch
>
>
> Project operator should not return true in any case, this may trigger 
> additional rewritings in presence of constraints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data

2019-04-04 Thread Abhishek Somani (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Somani updated HIVE-20901:
---
Attachment: HIVE-20901.1.patch

> running compactor when there is nothing to do produces duplicate data
> -
>
> Key: HIVE-20901
> URL: https://issues.apache.org/jira/browse/HIVE-20901
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Eugene Koifman
>Assignee: Abhishek Somani
>Priority: Major
> Attachments: HIVE-20901.1.patch
>
>
> suppose we run minor compaction 2 times, via alter table
> The 2nd request to compaction should have nothing to do but I don't think 
> there is a check for that.  It's visible in the context of HIVE-20823, where 
> each compactor run produces a delta with new visibility suffix so we end up 
> with something like
> {noformat}
> target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/
> ├── delete_delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delete_delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_001_
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> └── delta_002_002_
>     ├── _orc_acid_version
>     └── bucket_0{noformat}
> i.e. 2 deltas with the same write ID range
> this is bad.  Probably happens today as well but new run produces a delta 
> with the same name and clobbers the previous one, which may interfere with 
> writers
>  
> need to investigate
>  
> -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both 
> deltas as if they were distinct and it effectively duplicates data.-  There 
> is no data duplication - {{getAcidState()}} will not use 2 deltas with the 
> same {{writeid}} range
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-20901) running compactor when there is nothing to do produces duplicate data

2019-04-04 Thread Abhishek Somani (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Somani reassigned HIVE-20901:
--

Assignee: Abhishek Somani  (was: Eugene Koifman)

> running compactor when there is nothing to do produces duplicate data
> -
>
> Key: HIVE-20901
> URL: https://issues.apache.org/jira/browse/HIVE-20901
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Eugene Koifman
>Assignee: Abhishek Somani
>Priority: Major
>
> suppose we run minor compaction 2 times, via alter table
> The 2nd request to compaction should have nothing to do but I don't think 
> there is a check for that.  It's visible in the context of HIVE-20823, where 
> each compactor run produces a delta with new visibility suffix so we end up 
> with something like
> {noformat}
> target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/
> ├── delete_delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delete_delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_001_
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> └── delta_002_002_
>     ├── _orc_acid_version
>     └── bucket_0{noformat}
> i.e. 2 deltas with the same write ID range
> this is bad.  Probably happens today as well but new run produces a delta 
> with the same name and clobbers the previous one, which may interfere with 
> writers
>  
> need to investigate
>  
> -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both 
> deltas as if they were distinct and it effectively duplicates data.-  There 
> is no data duplication - {{getAcidState()}} will not use 2 deltas with the 
> same {{writeid}} range
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21583) KillTriggerActionHandler should use "hive" credential

2019-04-04 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21583:
--
Status: Patch Available  (was: Open)

> KillTriggerActionHandler should use "hive" credential
> -
>
> Key: HIVE-21583
> URL: https://issues.apache.org/jira/browse/HIVE-21583
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21583.1.patch
>
>
> Currently SessionState.username is set to null, which is invalid as 
> KillQueryImplementation will valid the user privilege.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21583) KillTriggerActionHandler should use "hive" credential

2019-04-04 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21583:
--
Attachment: HIVE-21583.1.patch

> KillTriggerActionHandler should use "hive" credential
> -
>
> Key: HIVE-21583
> URL: https://issues.apache.org/jira/browse/HIVE-21583
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21583.1.patch
>
>
> Currently SessionState.username is set to null, which is invalid as 
> KillQueryImplementation will valid the user privilege.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21583) KillTriggerActionHandler should use "hive" credential

2019-04-04 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21583:
--
Attachment: (was: HIVE-21583.1.patch)

> KillTriggerActionHandler should use "hive" credential
> -
>
> Key: HIVE-21583
> URL: https://issues.apache.org/jira/browse/HIVE-21583
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21583.1.patch
>
>
> Currently SessionState.username is set to null, which is invalid as 
> KillQueryImplementation will valid the user privilege.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21583) KillTriggerActionHandler should use "hive" credential

2019-04-04 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-21583:
--
Attachment: HIVE-21583.1.patch

> KillTriggerActionHandler should use "hive" credential
> -
>
> Key: HIVE-21583
> URL: https://issues.apache.org/jira/browse/HIVE-21583
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21583.1.patch
>
>
> Currently SessionState.username is set to null, which is invalid as 
> KillQueryImplementation will valid the user privilege.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21583) KillTriggerActionHandler should use "hive" credential

2019-04-04 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned HIVE-21583:
-


> KillTriggerActionHandler should use "hive" credential
> -
>
> Key: HIVE-21583
> URL: https://issues.apache.org/jira/browse/HIVE-21583
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
>
> Currently SessionState.username is set to null, which is invalid as 
> KillQueryImplementation will valid the user privilege.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21582) Prefix msck configs with metastore

2019-04-04 Thread Jason Dere (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810311#comment-16810311
 ] 

Jason Dere commented on HIVE-21582:
---

+1

> Prefix msck configs with metastore
> --
>
> Key: HIVE-21582
> URL: https://issues.apache.org/jira/browse/HIVE-21582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21582.1.patch
>
>
> HIVE-20707 moved msck configs to metastore but the configs are not prefixed 
> with "metastore". Will be good to prefix it with "metastore" for consistency 
> with other configs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21372) Use Apache Commons IO To Read Stream To String

2019-04-04 Thread David Mollitor (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810308#comment-16810308
 ] 

David Mollitor commented on HIVE-21372:
---

[~kgyrtkirk] [~ashutoshc] Can you please assist me with this patch?  Thanks!!

> Use Apache Commons IO To Read Stream To String
> --
>
> Key: HIVE-21372
> URL: https://issues.apache.org/jira/browse/HIVE-21372
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Trivial
> Fix For: 4.0.0
>
> Attachments: HIVE-21372.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21427) Syslog storage handler

2019-04-04 Thread Prasanth Jayachandran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21427:
-
Attachment: HIVE-21427.3.patch

> Syslog storage handler
> --
>
> Key: HIVE-21427
> URL: https://issues.apache.org/jira/browse/HIVE-21427
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21427.1.patch, HIVE-21427.2.patch, 
> HIVE-21427.3.patch
>
>
> It will be useful to read syslog generated log files in Hive. Hive generates 
> logs in RFC5424 log4j2 layout and stores it as external table in sys.db. This 
> includes a SyslogSerde that can parse RFC5424 formatted logs and maps them to 
> logs table schema for query processing by hive. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21582) Prefix msck configs with metastore

2019-04-04 Thread Prasanth Jayachandran (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810271#comment-16810271
 ] 

Prasanth Jayachandran commented on HIVE-21582:
--

[~jdere] could you please review this small patch?

> Prefix msck configs with metastore
> --
>
> Key: HIVE-21582
> URL: https://issues.apache.org/jira/browse/HIVE-21582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21582.1.patch
>
>
> HIVE-20707 moved msck configs to metastore but the configs are not prefixed 
> with "metastore". Will be good to prefix it with "metastore" for consistency 
> with other configs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21582) Prefix msck configs with metastore

2019-04-04 Thread Prasanth Jayachandran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21582:
-
Status: Patch Available  (was: Open)

> Prefix msck configs with metastore
> --
>
> Key: HIVE-21582
> URL: https://issues.apache.org/jira/browse/HIVE-21582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21582.1.patch
>
>
> HIVE-20707 moved msck configs to metastore but the configs are not prefixed 
> with "metastore". Will be good to prefix it with "metastore" for consistency 
> with other configs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21582) Prefix msck configs with metastore

2019-04-04 Thread Prasanth Jayachandran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21582:
-
Attachment: HIVE-21582.1.patch

> Prefix msck configs with metastore
> --
>
> Key: HIVE-21582
> URL: https://issues.apache.org/jira/browse/HIVE-21582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21582.1.patch
>
>
> HIVE-20707 moved msck configs to metastore but the configs are not prefixed 
> with "metastore". Will be good to prefix it with "metastore" for consistency 
> with other configs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21582) Prefix msck configs with metastore

2019-04-04 Thread Prasanth Jayachandran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reassigned HIVE-21582:



> Prefix msck configs with metastore
> --
>
> Key: HIVE-21582
> URL: https://issues.apache.org/jira/browse/HIVE-21582
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
>
> HIVE-20707 moved msck configs to metastore but the configs are not prefixed 
> with "metastore". Will be good to prefix it with "metastore" for consistency 
> with other configs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21561) Revert removal of TableType.INDEX_TABLE enum

2019-04-04 Thread Jason Dere (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-21561:
--
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> Revert removal of TableType.INDEX_TABLE enum
> 
>
> Key: HIVE-21561
> URL: https://issues.apache.org/jira/browse/HIVE-21561
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21561.1.patch, HIVE-21561.2.patch
>
>
> Index tables have been removed from Hive as of HIVE-18715.
> However, in case users still have index tables defined in the metastore, we 
> should keep the TableType.INDEX_TABLE enum around so that users can drop 
> these tables. Without the enum defined Hive cannot do anything with them as 
> it fails with IllegalArgumentException errors when trying to call 
> TableType.valueOf() on INDEX_TABLE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21581) Remove Lock in GetInputSummary

2019-04-04 Thread David Mollitor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-21581:
--
Attachment: HIVE-21581.1.patch

> Remove Lock in GetInputSummary
> --
>
> Key: HIVE-21581
> URL: https://issues.apache.org/jira/browse/HIVE-21581
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21581.1.patch
>
>
> Now that Hive compile lock has been relaxed in [HIVE-20535], remove the 
> {{getInputSummary}} lock:
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2459]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21581) Remove Lock in GetInputSummary

2019-04-04 Thread David Mollitor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-21581:
--
Status: Patch Available  (was: Open)

> Remove Lock in GetInputSummary
> --
>
> Key: HIVE-21581
> URL: https://issues.apache.org/jira/browse/HIVE-21581
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21581.1.patch
>
>
> Now that Hive compile lock has been relaxed in [HIVE-20535], remove the 
> {{getInputSummary}} lock:
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2459]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21581) Remove Lock in GetInputSummary

2019-04-04 Thread David Mollitor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-21581:
-


> Remove Lock in GetInputSummary
> --
>
> Key: HIVE-21581
> URL: https://issues.apache.org/jira/browse/HIVE-21581
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0
>
>
> Now that Hive compile lock has been relaxed in [HIVE-20535], remove the 
> {{getInputSummary}} lock:
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2459]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21561) Revert removal of TableType.INDEX_TABLE enum

2019-04-04 Thread Prasanth Jayachandran (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810216#comment-16810216
 ] 

Prasanth Jayachandran commented on HIVE-21561:
--

+1

> Revert removal of TableType.INDEX_TABLE enum
> 
>
> Key: HIVE-21561
> URL: https://issues.apache.org/jira/browse/HIVE-21561
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-21561.1.patch, HIVE-21561.2.patch
>
>
> Index tables have been removed from Hive as of HIVE-18715.
> However, in case users still have index tables defined in the metastore, we 
> should keep the TableType.INDEX_TABLE enum around so that users can drop 
> these tables. Without the enum defined Hive cannot do anything with them as 
> it fails with IllegalArgumentException errors when trying to call 
> TableType.valueOf() on INDEX_TABLE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19875) increase LLAP IO queue size for perf

2019-04-04 Thread Prasanth Jayachandran (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810184#comment-16810184
 ] 

Prasanth Jayachandran commented on HIVE-19875:
--

[~klcopp] I reverted the commit from branch-3.0 that breaks the build. Is there 
a release planned from branch-3.0?

> increase LLAP IO queue size for perf
> 
>
> Key: HIVE-19875
> URL: https://issues.apache.org/jira/browse/HIVE-19875
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19875.1.patch, HIVE-19875.2.patch
>
>
> According to [~gopalv] queue limit has perf impact, esp. during hashtable 
> load for mapjoin where in the past IO used to queue up more data for 
> processing to process.
> 1) Overall the default limit could be adjusted higher.
> 2) Depending on Decimal64 availability, the weight for decimal columns could 
> be reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-19875) increase LLAP IO queue size for perf

2019-04-04 Thread Prasanth Jayachandran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-19875:
-
Fix Version/s: (was: 3.0.1)

> increase LLAP IO queue size for perf
> 
>
> Key: HIVE-19875
> URL: https://issues.apache.org/jira/browse/HIVE-19875
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Fix For: 3.1.0, 4.0.0
>
> Attachments: HIVE-19875.1.patch, HIVE-19875.2.patch
>
>
> According to [~gopalv] queue limit has perf impact, esp. during hashtable 
> load for mapjoin where in the past IO used to queue up more data for 
> processing to process.
> 1) Overall the default limit could be adjusted higher.
> 2) Depending on Decimal64 availability, the weight for decimal columns could 
> be reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21506) Memory based TxnHandler implementation

2019-04-04 Thread Todd Lipcon (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810171#comment-16810171
 ] 

Todd Lipcon commented on HIVE-21506:


http://www.vldb.org/pvldb/2/vldb09-157.pdf is a good paper that talks about 
techniques like the above where a lock is temporallly extended across multiple 
transactions to reduce lock manager contention

> Memory based TxnHandler implementation
> --
>
> Key: HIVE-21506
> URL: https://issues.apache.org/jira/browse/HIVE-21506
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Peter Vary
>Priority: Major
>
> The current TxnHandler implementations are using the backend RDBMS to store 
> every Hive lock and transaction data, so multiple TxnHandler instances can 
> run simultaneously and can serve requests. The continuous 
> communication/locking done on the RDBMS side puts serious load on the backend 
> databases also restricts the possible throughput.
> If it is possible to have only a single active TxnHandler (with the current 
> design HMS) instance then we can provide much better (using only java based 
> locking) performance. We still have to store the committed write transactions 
> to the RDBMS (or later some other persistent storage), but other lock and 
> transaction operations could remain memory only.
> The most important drawbacks with this solution is that we definitely lose 
> scalability when one instance of TxnHandler is no longer able to serve the 
> requests (see NameNode), and fault tolerance in the sense that the ongoing 
> transactions should be terminated when the TxnHandler is failed. If this 
> drawbacks are acceptable in certain situations the we can provide better 
> throughput for the users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21427) Syslog storage handler

2019-04-04 Thread Prasanth Jayachandran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21427:
-
Attachment: HIVE-21427.2.patch

> Syslog storage handler
> --
>
> Key: HIVE-21427
> URL: https://issues.apache.org/jira/browse/HIVE-21427
> Project: Hive
>  Issue Type: New Feature
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21427.1.patch, HIVE-21427.2.patch
>
>
> It will be useful to read syslog generated log files in Hive. Hive generates 
> logs in RFC5424 log4j2 layout and stores it as external table in sys.db. This 
> includes a SyslogSerde that can parse RFC5424 formatted logs and maps them to 
> logs table schema for query processing by hive. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21573) Binary transport shall ignore principal if auth is set to delegationToken

2019-04-04 Thread Daniel Dai (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810145#comment-16810145
 ] 

Daniel Dai commented on HIVE-21573:
---

Now seems you only try username/password if AUTH_TYPE=AUTH_TOKEN, is that an 
intended change?

> Binary transport shall ignore principal if auth is set to delegationToken
> -
>
> Key: HIVE-21573
> URL: https://issues.apache.org/jira/browse/HIVE-21573
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
> Attachments: HIVE-21573.3.patch
>
>
> When Beeline is used by Sqoop from Oozie sqoop action in a kerberized 
> cluster, Sqoop passes Hive delegation token to Beeline when invokes the 
> *beeline* command. Unfortunately, Beeline puts principal=XY parameter to JDBC 
> url so when binary transport is needed it will use principal based 
> authentication instead of token based.
> Related code: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L688L705]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21506) Memory based TxnHandler implementation

2019-04-04 Thread Todd Lipcon (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810130#comment-16810130
 ] 

Todd Lipcon commented on HIVE-21506:


Does this imply that you'd move the transaction manager out of the HMS into a 
standalone daemon? Then we need to worry about HA for that daemon as well as 
what happens to locks if the daemon crashes, right? It's probably possible but 
would be quite a bit of work.

Another potential design for scaling lock management is to use revocable 
"sticky" locks. I think there's a decent amount of literature from the 
shared-nothing DBMS world on this technique. With some quick googling I found 
that the Frangipani DFS paper has some discussion of the technique, but I think 
we can probably find a more detailed description of it elsewhere.

At a very high level, the idea would look something like this for shared locks:
- when a user wants to acquire a shared lock, the HMS checks if any other 
transaction already has the same shared lock held. If not, we acquire the lock 
in the database, and associate it not with any particular transaction, but with 
the HMS's lock manager itself. The HMS then becomes responsible for 
heartbeating the lock while it's held. In essence, the HMS has now taken out a 
"lease" on this lock.
- if the HMS already has this shared lock held on behalf of another 
transaction, increment an in-memory reference count.
- when a lock is released, if the refcount > 1, simply decrement the in-memory 
ref count.
- if the lock is released and the refcount goes to 0, the HMS can be lazy about 
releasing the lock in the DB (either forever or for some amount of time). In 
essence the lock is "sticky".

Given that most locks are shared locks, this should mean that the majority of 
locking operations do not require any trip to the RDBMS and can be processed in 
memory, but are backed persistently by a lock in the DB.

If a caller wants to acquire an exclusive lock which conflicts with an existing 
shared lock in the DB, we need to implement revocation:
- add a record indicating that there's a waiter on the lock, blocked on the 
existing shared lock(s)
- send a revocation request to any HMS(s) holding the shared locks. In the case 
that they're just being held in "sticky" mode, they can be revoked immediately. 
If there is actually an active refcount, this will just enforce that new shared 
lockers need to wait instead of incrementing the refcount.
- in the case that an HMS holding a sticky lock has crashed or partitioned, we 
need to wait out the "lease" to expire before we can revoke its lock.

There's some trickiness to think through about client->HMS "stickiness" in HA 
scenarios, as well. Right now, the lock requests may be sent to a different HMS 
than the 'commit/abort' request for a transaction, but that could be difficult 
with "sticky locks".


All of the above is a bit complicated, so maybe a first step is to just look at 
some kind of stress test/benchmark and understand if we can do any changes to 
the way we manage the RDBMS table to be more efficient? Perhaps if we 
specialize the implementation for a specific RDBMS (eg postgres) we could get 
some benefits here (eg stored procedures to avoid round trips if that turns out 
to be the bottleneck)

> Memory based TxnHandler implementation
> --
>
> Key: HIVE-21506
> URL: https://issues.apache.org/jira/browse/HIVE-21506
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Peter Vary
>Priority: Major
>
> The current TxnHandler implementations are using the backend RDBMS to store 
> every Hive lock and transaction data, so multiple TxnHandler instances can 
> run simultaneously and can serve requests. The continuous 
> communication/locking done on the RDBMS side puts serious load on the backend 
> databases also restricts the possible throughput.
> If it is possible to have only a single active TxnHandler (with the current 
> design HMS) instance then we can provide much better (using only java based 
> locking) performance. We still have to store the committed write transactions 
> to the RDBMS (or later some other persistent storage), but other lock and 
> transaction operations could remain memory only.
> The most important drawbacks with this solution is that we definitely lose 
> scalability when one instance of TxnHandler is no longer able to serve the 
> requests (see NameNode), and fault tolerance in the sense that the ongoing 
> transactions should be terminated when the TxnHandler is failed. If this 
> drawbacks are acceptable in certain situations the we can provide better 
> throughput for the users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20901) running compactor when there is nothing to do produces duplicate data

2019-04-04 Thread Eugene Koifman (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810093#comment-16810093
 ] 

Eugene Koifman commented on HIVE-20901:
---

go ahead.  Looking at the description, there is no data duplication issue here 
and now that compactor runs in a transaction the 2 compactor runs will output 
distinct directories.  

> running compactor when there is nothing to do produces duplicate data
> -
>
> Key: HIVE-20901
> URL: https://issues.apache.org/jira/browse/HIVE-20901
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 4.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> suppose we run minor compaction 2 times, via alter table
> The 2nd request to compaction should have nothing to do but I don't think 
> there is a check for that.  It's visible in the context of HIVE-20823, where 
> each compactor run produces a delta with new visibility suffix so we end up 
> with something like
> {noformat}
> target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/
> ├── delete_delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delete_delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_001_
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v019
> │   ├── _orc_acid_version
> │   └── bucket_0
> ├── delta_001_002_v021
> │   ├── _orc_acid_version
> │   └── bucket_0
> └── delta_002_002_
>     ├── _orc_acid_version
>     └── bucket_0{noformat}
> i.e. 2 deltas with the same write ID range
> this is bad.  Probably happens today as well but new run produces a delta 
> with the same name and clobbers the previous one, which may interfere with 
> writers
>  
> need to investigate
>  
> -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both 
> deltas as if they were distinct and it effectively duplicates data.-  There 
> is no data duplication - {{getAcidState()}} will not use 2 deltas with the 
> same {{writeid}} range
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20822) Improvements to push computation to JDBC from Calcite

2019-04-04 Thread Jesus Camacho Rodriguez (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20822:
---
Attachment: HIVE-20822.07.patch

> Improvements to push computation to JDBC from Calcite
> -
>
> Key: HIVE-20822
> URL: https://issues.apache.org/jira/browse/HIVE-20822
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-20822.01.patch, HIVE-20822.02.patch, 
> HIVE-20822.02.patch, HIVE-20822.03.patch, HIVE-20822.04.patch, 
> HIVE-20822.05.patch, HIVE-20822.06.patch, HIVE-20822.07.patch, 
> HIVE-20822.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21573) Binary transport shall ignore principal if auth is set to delegationToken

2019-04-04 Thread Zoltan Haindrich (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810033#comment-16810033
 ] 

Zoltan Haindrich commented on HIVE-21573:
-

+1
I think we may throw an exception instead of logging it - however as it seems 
like "transport" is an instance field...it could be handled in a follow-up; to 
separate cleanup

> Binary transport shall ignore principal if auth is set to delegationToken
> -
>
> Key: HIVE-21573
> URL: https://issues.apache.org/jira/browse/HIVE-21573
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
> Attachments: HIVE-21573.3.patch
>
>
> When Beeline is used by Sqoop from Oozie sqoop action in a kerberized 
> cluster, Sqoop passes Hive delegation token to Beeline when invokes the 
> *beeline* command. Unfortunately, Beeline puts principal=XY parameter to JDBC 
> url so when binary transport is needed it will use principal based 
> authentication instead of token based.
> Related code: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L688L705]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-21575) Add support for SQL:2016 datetime templates/patterns/masks and CAST(... AS ... FORMAT )

2019-04-04 Thread Karen Coppage (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-21575:



> Add support for SQL:2016 datetime templates/patterns/masks and CAST(... AS 
> ... FORMAT )
> 
>
> Key: HIVE-21575
> URL: https://issues.apache.org/jira/browse/HIVE-21575
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: SQL, datetime
>
> *Summary*
>  Timestamp and date handling and formatting is currently implemented in Hive 
> using (sometimes very specific) [Java SimpleDateFormat 
> patterns|http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html]
>  , however, it is not what most standard SQL systems use. For example see 
> [Vertica|https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Formatting/TemplatePatternsForDateTimeFormatting.htm],
>  
> [Netezza|http://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_ntz_sql_extns_templ_patterns_date_time_conv.html],
>  
> [Oracle|https://docs.oracle.com/database/121/SQLRF/sql_elements004.htm#SQLRF00212],
>  and 
> [PostgreSQL|https://www.postgresql.org/docs/9.5/static/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIME-TABLE].
> *Cast...Format*
> SQL:2016 also introduced the FORMAT clause for CAST which is the standard way 
> to do string <-> datetime conversions
> For example:
> {code:java}
> CAST( AS  [FORMAT ])
> CAST( AS  [FORMAT ])
> cast(dt as string format 'DD-MM-')
> cast('01-05-2017' as date format 'DD-MM-')
> {code}
> [Stuff like this|http://bigdataprogrammers.com/string-date-conversion-hive/] 
> wouldn't need to happen.
> *New SQL:2016 Patterns*
> Examples:
> {code:java}
> --old SimpleDateFormat
> select date_format('2015-05-15 12:00:00', 'MMM dd,  HH:mm:ss');
> --new SQL:2016 format
> select date_format('2015-05-15 12:00:00', 'mon dd,  hh:mi:ss');
> {code}
> Some other conflicting examples:
> SimpleDateTime: 'MMM dd,  HH:mm:ss'
>  SQL:2016: 'mon dd,  hh:mi:ss'
> SimpleDateTime: '-MM-dd HH:mm:ss'
>  SQL:2016: '-mm-dd hh24:mi:ss'
> We will have a session-level feature flag to revert to the legacy Java 
> SimpleDateFormat patterns. This would allow users to chose the behavior they 
> desire and scope it to a session if need be.
> For the full list of patterns, see subsection "Proposal for Impala’s datetime 
> patterns" in this doc: 
> [https://docs.google.com/document/d/1V7k6-lrPGW7_uhqM-FhKl3QsxwCRy69v2KIxPsGjc1k/edit]
> *Existing Hive functions affected*
> Other functions use SimpleDateFormat internally; these are the ones afaik 
> where SimpleDateFormat or some similar format is part of the input:
>  * from_unixtime(bigint unixtime[, string format])
>  * unix_timestamp(string date, string pattern)
>  * to_unix_timestamp(date[, pattern])
>  * add_months(string start_date, int num_months, output_date_format)
>  * trunc(string date, string format) - currently only supports 
> 'MONTH'/'MON'/'MM', 'QUARTER'/'Q' and 'YEAR'/''/'YY' as format.
>  * date_format(date/timestamp/string ts, string fmt)
> This description is a heavily edited description of IMPALA-4018.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-14888) SparkClientImpl checks for "kerberos" string in hiveconf only when determining whether to use keytab file.

2019-04-04 Thread David McGinnis (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809957#comment-16809957
 ] 

David McGinnis commented on HIVE-14888:
---

Looks like we are safe in the assumption that the login will either be simple 
or kerberos. Anything else is rejected because the login type is not set. Error 
thrown is from the line below:

[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1474]

Once the build issues in the pre-commit build are resolved, I'll resubmit the 
patch and continue with this submission.

> SparkClientImpl checks for "kerberos" string in hiveconf only when 
> determining whether to use keytab file.
> --
>
> Key: HIVE-14888
> URL: https://issues.apache.org/jira/browse/HIVE-14888
> Project: Hive
>  Issue Type: Bug
>  Components: spark-branch
>Affects Versions: 2.1.0
>Reporter: Thomas Rega
>Assignee: David McGinnis
>Priority: Major
> Attachments: HIVE-14888.1-spark.patch, HIVE-14888.2.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> The SparkClientImpl will only provide a principal and keytab argument if the 
> HADOOP_SECURITY_AUTHENTICATION in hive conf is set to "kerberos". This will 
> not work on clusters with Hadoop security enabled that are not configured as 
> "kerberos", for example, a cluster which is configured for "ldap".
> The solution is to call UserGroupInformation.isSecurityEnabled() instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21570) Convert llap iomem servlets output to json format

2019-04-04 Thread Antal Sinkovits (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809938#comment-16809938
 ] 

Antal Sinkovits commented on HIVE-21570:


From:

Cache state: 
llap.src: 6/6, 24576/24576
llap.src2: 6/6, 24576/24576
llap.src_orc: 11/11, 2367488/2367488
LRFU eviction list: 0 items
LRFU eviction heap: 23 items (of max 262144)
LRFU data on heap: 2,30MB
LRFU metadata on heap: 8,00KB
LRFU data on eviction list: 0B
LRFU metadata on eviction list: 0B
LRFU data locked: 0B
LRFU metadata locked: 0B
ORC cache state 
  file [-499819754494858140, 1553251779000, 4506]: 0 locked, 9 unlocked, 0 
evicted, 0 being moved,2359296 total used byte
ORC cache summary: 0 locked, 9 unlocked, 0 evicted, 0 being moved,2359296total 
used space
SerDe cache state 4 columns, 1 stripes; 
  file [5231951861207551345, 1553181313000, 9718]: 0 locked, 6 unlocked, 0 
evicted, 0 being moved3 columns, 1 stripes; 
  file [-5708520830205528612, 1553180236000, 5812]: 0 locked, 6 unlocked, 0 
evicted, 0 being moved
SerDe cache summary: 0 locked, 12 unlocked, 0 evicted, 0 being moved
Metadata cache state: 2 files and stripes, 8192 total used bytes, 0 files w/ORC 
estimate
Defrag counters: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
Allocator state:
Arena with free list lengths by size: 32768 => 1, 65536 => 1, 131072 => 1, 
524288 => 1, 1048576 => 1, 2097152 => 1, 4194304 => 1, 8388608 => 1, 16777216 
=> 7, 
Arena with free list lengths by size: 16777216 => 8, 
Arena with free list lengths by size: 16777216 => 8, 
Arena with free list lengths by size: 16777216 => 8, 
Arena with free list lengths by size: 8192 => 1, 32768 => 1, 65536 => 1, 131072 
=> 1, 524288 => 1, 1048576 => 1, 2097152 => 1, 4194304 => 1, 8388608 => 1, 
16777216 => 7, 
Arena with free list lengths by size: 16777216 => 8, 
Arena with free list lengths by size: 262144 => 1, 2097152 => 1, 4194304 => 1, 
8388608 => 1, 16777216 => 7, 
Arena with free list lengths by size: 16777216 => 8, 
Total available and allocated: 1071325184; unallocated arenas: 0; full arenas 0

To:

{
  "SerDeLowLevelCacheImpl" : {
"entries" : [ {
  "stripes" : "1",
  "file" : "[5231951861207551345, 1553181313000, 9718]",
  "columns" : "4",
  "being_moved" : "0",
  "locked" : "0",
  "unlocked" : "6",
  "evicted" : "0"
}, {
  "stripes" : "1",
  "file" : "[-5708520830205528612, 1553180236000, 5812]",
  "columns" : "3",
  "being_moved" : "0",
  "locked" : "0",
  "unlocked" : "6",
  "evicted" : "0"
} ],
"being_moved" : "0",
"locked" : "0",
"unlocked" : "12",
"evicted" : "0"
  },
  "LowLevelCacheImpl" : {
"entries" : [ {
  "file" : "[-499819754494858140, 1553251779000, 4506]",
  "being_moved" : "0",
  "locked" : "0",
  "total_used_byte" : "2359296",
  "unlocked" : "9",
  "evicted" : "0"
} ],
"total_used_space" : "2359296",
"moved" : "0",
"locked" : "0",
"unlocked" : "9",
"evicted" : "0"
  },
  "LowLevelLrfuCachePolicy" : {
"meta_locked" : "0B",
"data_on_heap" : "2,30MB",
"eviction_list" : "0",
"eviction_heap_size" : "23",
"meta_on_heap" : "8,00KB",
"data_on_eviction_list" : "0B",
"meta_on_eviction_list" : "0B",
"eviction_heap_size_max" : "262144",
"data_locked" : "0B"
  },
  "MetadataCache" : {
"files_and_stripes" : "2",
"total_used_bytes" : "8192",
"files_w_ORC_estimate" : "0"
  },
  "CacheContentsTracker" : [ {
"max_count" : "6",
"state_name" : "llap.src",
"total_size" : "24576",
"max_size" : "24576",
"buffer_count" : "6"
  }, {
"max_count" : "6",
"state_name" : "llap.src2",
"total_size" : "24576",
"max_size" : "24576",
"buffer_count" : "6"
  }, {
"max_count" : "11",
"state_name" : "llap.src_orc",
"total_size" : "2367488",
"max_size" : "2367488",
"buffer_count" : "11"
  } ],
  "BuddyAllocator" : {
"defrag_counters" : [ "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", 
"0", "0", "0" ],
"arenas" : [ {
  "name" : "arena0",
  "free" : [ {
"size" : "16777216",
"count" : "8"
  } ]
}, {
  "name" : "arena1",
  "free" : [ {
"size" : "262144",
"count" : "1"
  }, {
"size" : "2097152",
"count" : "1"
  }, {
"size" : "4194304",
"count" : "1"
  }, {
"size" : "8388608",
"count" : "1"
  }, {
"size" : "16777216",
"count" : "7"
  } ]
}, {
  "name" : "arena2",
  "free" : [ {
"size" : "16777216",
"count" : "8"
  } ]
}, {
  "name" : "arena3",
  "free" : [ {
"size" : "16777216",
"count" : "8"
  } ]
}, {
  "name" : "arena4",
  "free" : [ {
"size" : "8192",
"count" : "1"
  }, {
"size" : "32768",
"count" : "1"
  }, {
"size"

[jira] [Comment Edited] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-04 Thread Karen Coppage (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809917#comment-16809917
 ] 

Karen Coppage edited comment on HIVE-21291 at 4/4/19 2:35 PM:
--

Patch 1 is a WIP because the AvroSerDe seems to be used by KafkaSerDe and 
HBaseSerde to deserialize files and/or structs.

This would mean structs deserialized by Avro in HBase 
[(example)|https://blog.cloudera.com/blog/2016/05/how-to-improve-apache-hbase-performance-via-data-serialization-with-apache-avro/]
 would have Instant semantics, which is backwards compatible, if not optimal.

Kafka integration might cause further problems with interoperability, depending 
on which components might be reading from the Kafka clusters that Hive is 
reading/writing from. Still waiting on an expert opinion.


was (Author: klcopp):
This is a WIP because the AvroSerDe seems to be used by KafkaSerDe and 
HBaseSerde to deserialize files and/or structs.

This would mean structs deserialized by Avro in HBase 
[(example)|https://blog.cloudera.com/blog/2016/05/how-to-improve-apache-hbase-performance-via-data-serialization-with-apache-avro/]
 would have Instant semantics, which is backwards compatible, if not optimal.

Kafka integration might cause further problems with interoperability, depending 
on which components might be reading from the Kafka clusters that Hive is 
reading/writing from. Still waiting on an expert opinion.

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21291) Restore historical way of handling timestamps in Avro while keeping the new semantics at the same time

2019-04-04 Thread Karen Coppage (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-21291:
-
Attachment: HIVE-21291.1.patch
Status: Patch Available  (was: In Progress)

This is a WIP because the AvroSerDe seems to be used by KafkaSerDe and 
HBaseSerde to deserialize files and/or structs.

This would mean structs deserialized by Avro in HBase 
[(example)|https://blog.cloudera.com/blog/2016/05/how-to-improve-apache-hbase-performance-via-data-serialization-with-apache-avro/]
 would have Instant semantics, which is backwards compatible, if not optimal.

Kafka integration might cause further problems with interoperability, depending 
on which components might be reading from the Kafka clusters that Hive is 
reading/writing from. Still waiting on an expert opinion.

> Restore historical way of handling timestamps in Avro while keeping the new 
> semantics at the same time
> --
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Ivanfi
>Assignee: Karen Coppage
>Priority: Major
> Attachments: HIVE-21291.1.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following 
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the 
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had 
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a 
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have 
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as 
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this 
> leads to the desired new semantics, it also leads to incorrect results when 
> new Hive versions read timestamps written by old Hive versions or when old 
> Hive versions or any other component not aware of this change (including 
> legacy Impala and Spark versions) read timestamps written by new Hive 
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing 
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary 
> SerDe. In itself, this would restore the historical _Instant_ semantics, 
> which is undesirable. In order to achieve the desired _LocalDateTime_ 
> semantics in spite of normalizing to UTC, newer Hive versions should record 
> the session-local local time zone in the file metadata fields serving 
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or 
> any other new component aware of this extra metadata) can achieve 
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone 
> (instead of to the local time zone)*. Legacy components that are unaware of 
> the new metadata can read the files without any problem and the timestamps 
> will show the historical Instant behaviour to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21570) Convert llap iomem servlets output to json format

2019-04-04 Thread Antal Sinkovits (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-21570:
---
Attachment: HIVE-21570.01.patch

> Convert llap iomem servlets output to json format
> -
>
> Key: HIVE-21570
> URL: https://issues.apache.org/jira/browse/HIVE-21570
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Minor
> Attachments: HIVE-21570.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21570) Convert llap iomem servlets output to json format

2019-04-04 Thread Antal Sinkovits (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-21570:
---
Status: Patch Available  (was: In Progress)

> Convert llap iomem servlets output to json format
> -
>
> Key: HIVE-21570
> URL: https://issues.apache.org/jira/browse/HIVE-21570
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Minor
> Attachments: HIVE-21570.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-14669) Have the actual error reported when a q test fails instead of having to go through the logs

2019-04-04 Thread Laszlo Bodor (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809882#comment-16809882
 ] 

Laszlo Bodor edited comment on HIVE-14669 at 4/4/19 2:01 PM:
-

[~sseth]: I think it's still an issue, a proper solution would help a lot all 
the developers
currently what I'm seeing is: only an error code and the source query is 
dropped to the console:
 !01_mvn_out.png!

but hive.log contains the full trace ( !02_hive_log.png! )

I think, displaying the latter on the console output would be great, what's 
your opinion?


was (Author: abstractdog):
[~sseth]: I think it's still an issue, a proper solution would help a lot all 
the developers
currently what I'm seeing is: only an error code and the source query is 
dropped to the console ( !01_mvn_out.png! ), but hive.log contains the full 
trace ( !02_hive_log.png! ), I think, displaying the latter on the console 
output would be great, what's your opinion?

> Have the actual error reported when a q test fails instead of having to go 
> through the logs
> ---
>
> Key: HIVE-14669
> URL: https://issues.apache.org/jira/browse/HIVE-14669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: 01_mvn_out.png, 02_hive_log.png
>
>
> QTest runs end up invoking CliDriver.processLine. This, in most cases, 
> reports a numeric exit code - 0 for success. Non-zero for various different 
> error types (which are defined everywhere in the code).
> Internally CliDriver does have more information via CommandResult. A lot of 
> this is not exposed though. That's alright for the end user cli - (Command 
> line tool translating the error to a code and message). However, it makes 
> debugging very difficult for QTests - since the log needs to be looked at 
> each time.
> Errors generated by the actual backend execution are mostly available to the 
> client, and information about these should show up as well. (If it doesn't - 
> we have a usability issues to fix).
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-14669) Have the actual error reported when a q test fails instead of having to go through the logs

2019-04-04 Thread Laszlo Bodor (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809882#comment-16809882
 ] 

Laszlo Bodor commented on HIVE-14669:
-

[~sseth]: I think it's still an issue, a proper solution would help a lot all 
the developers
currently what I'm seeing is: only an error code and the source query is 
dropped to the console ( !01_mvn_out.png! ), but hive.log contains the full 
trace ( !02_hive_log.png! ), I think, displaying the latter on the console 
output would be great, what's your opinion?

> Have the actual error reported when a q test fails instead of having to go 
> through the logs
> ---
>
> Key: HIVE-14669
> URL: https://issues.apache.org/jira/browse/HIVE-14669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: 01_mvn_out.png, 02_hive_log.png
>
>
> QTest runs end up invoking CliDriver.processLine. This, in most cases, 
> reports a numeric exit code - 0 for success. Non-zero for various different 
> error types (which are defined everywhere in the code).
> Internally CliDriver does have more information via CommandResult. A lot of 
> this is not exposed though. That's alright for the end user cli - (Command 
> line tool translating the error to a code and message). However, it makes 
> debugging very difficult for QTests - since the log needs to be looked at 
> each time.
> Errors generated by the actual backend execution are mostly available to the 
> client, and information about these should show up as well. (If it doesn't - 
> we have a usability issues to fix).
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HIVE-14669) Have the actual error reported when a q test fails instead of having to go through the logs

2019-04-04 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor reassigned HIVE-14669:
---

Assignee: Laszlo Bodor

> Have the actual error reported when a q test fails instead of having to go 
> through the logs
> ---
>
> Key: HIVE-14669
> URL: https://issues.apache.org/jira/browse/HIVE-14669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: 01_mvn_out.png, 02_hive_log.png
>
>
> QTest runs end up invoking CliDriver.processLine. This, in most cases, 
> reports a numeric exit code - 0 for success. Non-zero for various different 
> error types (which are defined everywhere in the code).
> Internally CliDriver does have more information via CommandResult. A lot of 
> this is not exposed though. That's alright for the end user cli - (Command 
> line tool translating the error to a code and message). However, it makes 
> debugging very difficult for QTests - since the log needs to be looked at 
> each time.
> Errors generated by the actual backend execution are mostly available to the 
> client, and information about these should show up as well. (If it doesn't - 
> we have a usability issues to fix).
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-14669) Have the actual error reported when a q test fails instead of having to go through the logs

2019-04-04 Thread Laszlo Bodor (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-14669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laszlo Bodor updated HIVE-14669:

Attachment: 02_hive_log.png
01_mvn_out.png

> Have the actual error reported when a q test fails instead of having to go 
> through the logs
> ---
>
> Key: HIVE-14669
> URL: https://issues.apache.org/jira/browse/HIVE-14669
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: 01_mvn_out.png, 02_hive_log.png
>
>
> QTest runs end up invoking CliDriver.processLine. This, in most cases, 
> reports a numeric exit code - 0 for success. Non-zero for various different 
> error types (which are defined everywhere in the code).
> Internally CliDriver does have more information via CommandResult. A lot of 
> this is not exposed though. That's alright for the end user cli - (Command 
> line tool translating the error to a code and message). However, it makes 
> debugging very difficult for QTests - since the log needs to be looked at 
> each time.
> Errors generated by the actual backend execution are mostly available to the 
> client, and information about these should show up as well. (If it doesn't - 
> we have a usability issues to fix).
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-20979) Fix memory leak in hive streaming

2019-04-04 Thread Gauthier Leonard (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809868#comment-16809868
 ] 

Gauthier Leonard edited comment on HIVE-20979 at 4/4/19 1:51 PM:
-

It appears that the fixes are not present in the 3.1.1 release. Can I update 
the *Fix Version/s* accordingly? I assume the fixes will be in 3.1.2 as it is 
on master?


was (Author: nuttymoon):
It appears that the fixes are not present on the 3.1.1 release. Can I update 
the *Fix Version/s* accordingly? I assume the fixes will be in 3.1.2 as it is 
on master?

> Fix memory leak in hive streaming
> -
>
> Key: HIVE-20979
> URL: https://issues.apache.org/jira/browse/HIVE-20979
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 3.1.1
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.1.1
>
> Attachments: HIVE-20979.1.patch, HIVE-20979.1.patch, 
> HIVE-20979.2.patch, HIVE-20979.3.patch, HIVE-20979.4.patch, HIVE-20979.5.patch
>
>
> {{1) HiveStreamingConnection.Builder#init() adds a shutdown hook handler via 
> }}{{ShutdownHookManager.addShutdownHook but it is never removed which causes 
> all the handlers to accumulate and hence a memory leak.}}
> 2) AbstractRecordWriter creates an instance of FileSystem but does not close 
> it which in turn causes a leak due to accumulation in FileSystem$Cache#map
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-20979) Fix memory leak in hive streaming

2019-04-04 Thread Gauthier Leonard (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809868#comment-16809868
 ] 

Gauthier Leonard edited comment on HIVE-20979 at 4/4/19 1:47 PM:
-

It appears that the fixes are not present on the 3.1.1 release. Can I update 
the *Fix Version/s* accordingly? I assume the fixes will be in 3.1.2 as it is 
on master?


was (Author: nuttymoon):
It appears that the fixes are not present on the 3.1.1 release. Can I update 
the *Fix Version/s* accordingly? I assume the fix will be in 3.1.2 as it is on 
master?

> Fix memory leak in hive streaming
> -
>
> Key: HIVE-20979
> URL: https://issues.apache.org/jira/browse/HIVE-20979
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 3.1.1
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.1.1
>
> Attachments: HIVE-20979.1.patch, HIVE-20979.1.patch, 
> HIVE-20979.2.patch, HIVE-20979.3.patch, HIVE-20979.4.patch, HIVE-20979.5.patch
>
>
> {{1) HiveStreamingConnection.Builder#init() adds a shutdown hook handler via 
> }}{{ShutdownHookManager.addShutdownHook but it is never removed which causes 
> all the handlers to accumulate and hence a memory leak.}}
> 2) AbstractRecordWriter creates an instance of FileSystem but does not close 
> it which in turn causes a leak due to accumulation in FileSystem$Cache#map
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20979) Fix memory leak in hive streaming

2019-04-04 Thread Gauthier Leonard (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-20979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809868#comment-16809868
 ] 

Gauthier Leonard commented on HIVE-20979:
-

It appears that the fixes are not present on the 3.1.1 release. Can I update 
the *Fix Version/s* accordingly? I assume the fix will be in 3.1.2 as it is on 
master?

> Fix memory leak in hive streaming
> -
>
> Key: HIVE-20979
> URL: https://issues.apache.org/jira/browse/HIVE-20979
> Project: Hive
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 3.1.1
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.1.1
>
> Attachments: HIVE-20979.1.patch, HIVE-20979.1.patch, 
> HIVE-20979.2.patch, HIVE-20979.3.patch, HIVE-20979.4.patch, HIVE-20979.5.patch
>
>
> {{1) HiveStreamingConnection.Builder#init() adds a shutdown hook handler via 
> }}{{ShutdownHookManager.addShutdownHook but it is never removed which causes 
> all the handlers to accumulate and hence a memory leak.}}
> 2) AbstractRecordWriter creates an instance of FileSystem but does not close 
> it which in turn causes a leak due to accumulation in FileSystem$Cache#map
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21573) Binary transport shall ignore principal if auth is set to delegationToken

2019-04-04 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21573:
--
Attachment: HIVE-21573.3.patch

> Binary transport shall ignore principal if auth is set to delegationToken
> -
>
> Key: HIVE-21573
> URL: https://issues.apache.org/jira/browse/HIVE-21573
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
> Attachments: HIVE-21573.3.patch
>
>
> When Beeline is used by Sqoop from Oozie sqoop action in a kerberized 
> cluster, Sqoop passes Hive delegation token to Beeline when invokes the 
> *beeline* command. Unfortunately, Beeline puts principal=XY parameter to JDBC 
> url so when binary transport is needed it will use principal based 
> authentication instead of token based.
> Related code: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L688L705]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21573) Binary transport shall ignore principal if auth is set to delegationToken

2019-04-04 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21573:
--
Attachment: (was: HIVE-21573.2.patch)

> Binary transport shall ignore principal if auth is set to delegationToken
> -
>
> Key: HIVE-21573
> URL: https://issues.apache.org/jira/browse/HIVE-21573
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
> Attachments: HIVE-21573.3.patch
>
>
> When Beeline is used by Sqoop from Oozie sqoop action in a kerberized 
> cluster, Sqoop passes Hive delegation token to Beeline when invokes the 
> *beeline* command. Unfortunately, Beeline puts principal=XY parameter to JDBC 
> url so when binary transport is needed it will use principal based 
> authentication instead of token based.
> Related code: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L688L705]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21573) Binary transport shall ignore principal if auth is set to delegationToken

2019-04-04 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21573:
--
Attachment: (was: HIVE-21573.1.patch)

> Binary transport shall ignore principal if auth is set to delegationToken
> -
>
> Key: HIVE-21573
> URL: https://issues.apache.org/jira/browse/HIVE-21573
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
> Attachments: HIVE-21573.3.patch
>
>
> When Beeline is used by Sqoop from Oozie sqoop action in a kerberized 
> cluster, Sqoop passes Hive delegation token to Beeline when invokes the 
> *beeline* command. Unfortunately, Beeline puts principal=XY parameter to JDBC 
> url so when binary transport is needed it will use principal based 
> authentication instead of token based.
> Related code: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L688L705]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21573) Binary transport shall ignore principal if auth is set to delegationToken

2019-04-04 Thread Denes Bodo (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809850#comment-16809850
 ] 

Denes Bodo commented on HIVE-21573:
---

[~daijy], [~kgyrtkirk], [~hagleitn] Can you please take a look on this? Thanks

> Binary transport shall ignore principal if auth is set to delegationToken
> -
>
> Key: HIVE-21573
> URL: https://issues.apache.org/jira/browse/HIVE-21573
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
> Attachments: HIVE-21573.1.patch, HIVE-21573.2.patch
>
>
> When Beeline is used by Sqoop from Oozie sqoop action in a kerberized 
> cluster, Sqoop passes Hive delegation token to Beeline when invokes the 
> *beeline* command. Unfortunately, Beeline puts principal=XY parameter to JDBC 
> url so when binary transport is needed it will use principal based 
> authentication instead of token based.
> Related code: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L688L705]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21573) Binary transport shall ignore principal if auth is set to delegationToken

2019-04-04 Thread Denes Bodo (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denes Bodo updated HIVE-21573:
--
Attachment: HIVE-21573.2.patch

> Binary transport shall ignore principal if auth is set to delegationToken
> -
>
> Key: HIVE-21573
> URL: https://issues.apache.org/jira/browse/HIVE-21573
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 3.0.0
>Reporter: Denes Bodo
>Assignee: Denes Bodo
>Priority: Critical
> Attachments: HIVE-21573.1.patch, HIVE-21573.2.patch
>
>
> When Beeline is used by Sqoop from Oozie sqoop action in a kerberized 
> cluster, Sqoop passes Hive delegation token to Beeline when invokes the 
> *beeline* command. Unfortunately, Beeline puts principal=XY parameter to JDBC 
> url so when binary transport is needed it will use principal based 
> authentication instead of token based.
> Related code: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L688L705]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-04-04 Thread Adam Szita (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809694#comment-16809694
 ] 

Adam Szita commented on HIVE-21509:
---

Thanks Slim!

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch, HIVE-21509.3.patch, HIVE-21509.4.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21574) return wrong result when execute left join sql

2019-04-04 Thread Panda Song (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panda Song updated HIVE-21574:
--
Summary: return wrong result when execute left join sql  (was: return wrong 
result when excuting left join sql)

> return wrong result when execute left join sql
> --
>
> Key: HIVE-21574
> URL: https://issues.apache.org/jira/browse/HIVE-21574
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
> Environment: hive 3.1.0 hdfs 3.1.1
>Reporter: Panda Song
>Priority: Blocker
>
> when I use a table instead of the sub select,I get the right result,much more 
> rows are joined together(metrics old_uv is bigger!!!) 
> Is there some bugs here?
> Please help me ,thanks a lot!!
> {code:java}
> select 
> a.event_date,
> count(distinct a.device_id) as uv,
> count(distinct case when b.device_id is not null then b.device_id end) as 
> old_uv,
> count(distinct a.device_id) - count(distinct case when b.device_id is not 
> null then b.device_id end) as new_uv
> from
> (
> select
> event_date,
> device_id,
> qingting_id
> from datacenter.bl_page_chain_day
> where event_date = '2019-03-31'
> and (current_content like 'https://a.qingting.fm/membership5%'
> or current_content like 'https://m.qingting.fm/vips/members%'
> or current_content like 'https://sss.qingting.fm/vips/members/v2/%')
> )a
> left join
> (select
>   b.device_id
> from
> lzq_test.first_buy_vip a
> inner join datacenter.device_qingting b on a.qingting_id = b.qingting_id
> where a.first_buy < '2019-03-31'
> group by b.device_id
> )b
> on a.device_id = b.device_id
> group by a.event_date;
> {code}
> plan:
> {code:java}
> Plan optimized by CBO. 
> 
>  Vertex dependency in root stage
>  Map 1 <- Map 3 (BROADCAST_EDGE)
>  Reducer 2 <- Map 1 (SIMPLE_EDGE)   
>  Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Reducer 2 (ONE_TO_ONE_EDGE) 
>  Reducer 6 <- Reducer 5 (SIMPLE_EDGE)   
> 
>  Stage-0
>Fetch Operator   
>  limit:-1   
>  Stage-1
>Reducer 6
>File Output Operator [FS_26] 
>  Select Operator [SEL_25] (rows=35527639 width=349) 
>Output:["_col0","_col1","_col2","_col3"] 
>Group By Operator [GBY_24] (rows=35527639 width=349) 
>  Output:["_col0","_col1","_col2"],aggregations:["count(DISTINCT 
> KEY._col1:0._col0)","count(DISTINCT KEY._col1:1._col0)"],keys:KEY._col0 
><-Reducer 5 [SIMPLE_EDGE]
>  SHUFFLE [RS_23]
>PartitionCols:_col0  
>Group By Operator [GBY_22] (rows=71055278 width=349) 
>  
> Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["count(DISTINCT
>  _col1)","count(DISTINCT _col2)"],keys:true, _col1, _col2 
>  Select Operator [SEL_20] (rows=71055278 width=349) 
>Output:["_col1","_col2"] 
>Map Join Operator [MAPJOIN_45] (rows=71055278 width=349) 
>  
> Conds:RS_17.KEY.reducesinkkey0=RS_18.KEY.reducesinkkey0(Right 
> Outer),Output:["_col0","_col1"] 
><-Reducer 2 [ONE_TO_ONE_EDGE]
>  FORWARD [RS_17]
>PartitionCols:_col0  
>Group By Operator [GBY_12] (rows=21738609 width=235) 
>  Output:["_col0"],keys:KEY._col0 
><-Map 1 [SIMPLE_EDGE]
>  SHUFFLE [RS_11]
>PartitionCols:_col0  
>Group By Operator [GBY_10] (rows=43477219 
> width=235) 
>  Output:["_col0"],keys:_col0 
>  Map Join Operator [MAPJOIN_44] (rows=43477219 
> width=235) 
>
> Conds:SEL_2._col1=RS_7._col0(Inner),Output:["_col0"] 
>  <-Map 3 [BROADCAST_EDGE] 
>BROADCAST [RS_7] 
>  PartitionCols:_col0 
>  Select Operator [SEL_5] (rows=301013 
> width=228) 
>Output:["_col0"] 
>Filter Operator [FIL_32] (rows=301013 
> width=228) 
>  predicate:((first_buy < 
> DATE'2019-03-31') and qingting_id is not null) 
>

[jira] [Updated] (HIVE-21574) return wrong result when excuting left join sql

2019-04-04 Thread Panda Song (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panda Song updated HIVE-21574:
--
Description: 
when I use a table instead of the sub select,I get the right result,much more 
rows are joined together(metrics old_uv is bigger!!!) 

Is there some bugs here?

Please help me ,thanks a lot!!
{code:java}
select 
a.event_date,
count(distinct a.device_id) as uv,
count(distinct case when b.device_id is not null then b.device_id end) as 
old_uv,
count(distinct a.device_id) - count(distinct case when b.device_id is not null 
then b.device_id end) as new_uv
from
(
select
event_date,
device_id,
qingting_id
from datacenter.bl_page_chain_day
where event_date = '2019-03-31'
and (current_content like 'https://a.qingting.fm/membership5%'
or current_content like 'https://m.qingting.fm/vips/members%'
or current_content like 'https://sss.qingting.fm/vips/members/v2/%')
)a
left join
(select
  b.device_id
from
lzq_test.first_buy_vip a
inner join datacenter.device_qingting b on a.qingting_id = b.qingting_id
where a.first_buy < '2019-03-31'
group by b.device_id
)b
on a.device_id = b.device_id
group by a.event_date;
{code}
plan:
{code:java}
Plan optimized by CBO. 

 Vertex dependency in root stage
 Map 1 <- Map 3 (BROADCAST_EDGE)
 Reducer 2 <- Map 1 (SIMPLE_EDGE)   
 Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Reducer 2 (ONE_TO_ONE_EDGE) 
 Reducer 6 <- Reducer 5 (SIMPLE_EDGE)   

 Stage-0
   Fetch Operator   
 limit:-1   
 Stage-1
   Reducer 6
   File Output Operator [FS_26] 
 Select Operator [SEL_25] (rows=35527639 width=349) 
   Output:["_col0","_col1","_col2","_col3"] 
   Group By Operator [GBY_24] (rows=35527639 width=349) 
 Output:["_col0","_col1","_col2"],aggregations:["count(DISTINCT 
KEY._col1:0._col0)","count(DISTINCT KEY._col1:1._col0)"],keys:KEY._col0 
   <-Reducer 5 [SIMPLE_EDGE]
 SHUFFLE [RS_23]
   PartitionCols:_col0  
   Group By Operator [GBY_22] (rows=71055278 width=349) 
 
Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["count(DISTINCT 
_col1)","count(DISTINCT _col2)"],keys:true, _col1, _col2 
 Select Operator [SEL_20] (rows=71055278 width=349) 
   Output:["_col1","_col2"] 
   Map Join Operator [MAPJOIN_45] (rows=71055278 width=349) 
 
Conds:RS_17.KEY.reducesinkkey0=RS_18.KEY.reducesinkkey0(Right 
Outer),Output:["_col0","_col1"] 
   <-Reducer 2 [ONE_TO_ONE_EDGE]
 FORWARD [RS_17]
   PartitionCols:_col0  
   Group By Operator [GBY_12] (rows=21738609 width=235) 
 Output:["_col0"],keys:KEY._col0 
   <-Map 1 [SIMPLE_EDGE]
 SHUFFLE [RS_11]
   PartitionCols:_col0  
   Group By Operator [GBY_10] (rows=43477219 width=235) 
 Output:["_col0"],keys:_col0 
 Map Join Operator [MAPJOIN_44] (rows=43477219 
width=235) 
   
Conds:SEL_2._col1=RS_7._col0(Inner),Output:["_col0"] 
 <-Map 3 [BROADCAST_EDGE] 
   BROADCAST [RS_7] 
 PartitionCols:_col0 
 Select Operator [SEL_5] (rows=301013 
width=228) 
   Output:["_col0"] 
   Filter Operator [FIL_32] (rows=301013 
width=228) 
 predicate:((first_buy < DATE'2019-03-31') 
and qingting_id is not null) 
 TableScan [TS_3] (rows=1062401 width=228) 
   lzq_test@first_buy_vip,a, transactional 
table,Tbl:COMPLETE,Col:NONE,Output:["qingting_id","first_buy"] 
 <-Select Operator [SEL_2] (rows=39524744 
width=235) 
 Output:["_col0","_col1"] 
 Filter Operator [FIL_31] (rows=39524744 
width=235) 
   predicate:qingting_id is not null 
   TableScan [TS_0] (rows=39524744 width=235) 
 datacenter@device_qingting,b, ACID 
table,Tbl:COMPLETE,Col:COM

[jira] [Updated] (HIVE-21574) return wrong result when excuting left join sql

2019-04-04 Thread Panda Song (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panda Song updated HIVE-21574:
--
Description: 
when I use a table instead of the sub select,I get the right result,much more 
rows are joined together(metrics old_uv is bigger!!!) 

Is there some bugs here?

Please help me ,thanks a lot!!
{code:java}
select 
a.event_date,
count(distinct a.device_id) as uv,
count(distinct case when b.device_id is not null then b.device_id end) as 
old_uv,
count(distinct a.device_id) - count(distinct case when b.device_id is not null 
then b.device_id end) as new_uv
from
(
select
event_date,
device_id,
qingting_id
from datacenter.bl_page_chain_day
where event_date = '2019-03-31'
and (current_content like 'https://a.qingting.fm/membership5%'
or current_content like 'https://m.qingting.fm/vips/members%'
or current_content like 'https://sss.qingting.fm/vips/members/v2/%')
)a
left join
(select
  b.device_id
from
lzq_test.first_buy_vip a
inner join datacenter.device_qingting b on a.qingting_id = b.qingting_id
where a.first_buy < '2019-03-31'
group by b.device_id
)b
on a.device_id = b.device_id
group by a.event_date;
{code}
plan:
{code:java}
Plan optimized by CBO. |

 Vertex dependency in root stage
 Map 1 <- Map 3 (BROADCAST_EDGE)
 Reducer 2 <- Map 1 (SIMPLE_EDGE)   
 Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Reducer 2 (ONE_TO_ONE_EDGE) 
 Reducer 6 <- Reducer 5 (SIMPLE_EDGE)   

 Stage-0
   Fetch Operator   
 limit:-1   
 Stage-1
   Reducer 6
   File Output Operator [FS_26] 
 Select Operator [SEL_25] (rows=35527639 width=349) 
   Output:["_col0","_col1","_col2","_col3"] 
   Group By Operator [GBY_24] (rows=35527639 width=349) 
 Output:["_col0","_col1","_col2"],aggregations:["count(DISTINCT 
KEY._col1:0._col0)","count(DISTINCT KEY._col1:1._col0)"],keys:KEY._col0 
   <-Reducer 5 [SIMPLE_EDGE]
 SHUFFLE [RS_23]
   PartitionCols:_col0  
   Group By Operator [GBY_22] (rows=71055278 width=349) 
 
Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["count(DISTINCT 
_col1)","count(DISTINCT _col2)"],keys:true, _col1, _col2 
 Select Operator [SEL_20] (rows=71055278 width=349) 
   Output:["_col1","_col2"] 
   Map Join Operator [MAPJOIN_45] (rows=71055278 width=349) 
 
Conds:RS_17.KEY.reducesinkkey0=RS_18.KEY.reducesinkkey0(Right 
Outer),Output:["_col0","_col1"] 
   <-Reducer 2 [ONE_TO_ONE_EDGE]
 FORWARD [RS_17]
   PartitionCols:_col0  
   Group By Operator [GBY_12] (rows=21738609 width=235) 
 Output:["_col0"],keys:KEY._col0 
   <-Map 1 [SIMPLE_EDGE]
 SHUFFLE [RS_11]
   PartitionCols:_col0  
   Group By Operator [GBY_10] (rows=43477219 width=235) 
 Output:["_col0"],keys:_col0 
 Map Join Operator [MAPJOIN_44] (rows=43477219 
width=235) 
   
Conds:SEL_2._col1=RS_7._col0(Inner),Output:["_col0"] 
 <-Map 3 [BROADCAST_EDGE] 
   BROADCAST [RS_7] 
 PartitionCols:_col0 
 Select Operator [SEL_5] (rows=301013 
width=228) 
   Output:["_col0"] 
   Filter Operator [FIL_32] (rows=301013 
width=228) 
 predicate:((first_buy < DATE'2019-03-31') 
and qingting_id is not null) 
 TableScan [TS_3] (rows=1062401 width=228) 
   lzq_test@first_buy_vip,a, transactional 
table,Tbl:COMPLETE,Col:NONE,Output:["qingting_id","first_buy"] 
 <-Select Operator [SEL_2] (rows=39524744 
width=235) 
 Output:["_col0","_col1"] 
 Filter Operator [FIL_31] (rows=39524744 
width=235) 
   predicate:qingting_id is not null 
   TableScan [TS_0] (rows=39524744 width=235) 
 datacenter@device_qingting,b, ACID 
table,Tbl:COMPLETE,Col:CO

[jira] [Updated] (HIVE-21574) return wrong result when excuting left join sql

2019-04-04 Thread Panda Song (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panda Song updated HIVE-21574:
--
Description: 
when I use a table instead of the sub select,I get the right result,much more 
rows are joined together(metrics old_uv is bigger!!!) 

Is there some bugs here?

Please help me ,thanks a lot!!

select 
 {{a.event_date,}}
 {{count(distinct a.device_id) as uv,}}
 {{count(distinct case when b.device_id is not null then b.device_id end) as 
old_uv,}}
 {{count(distinct a.device_id) - count(distinct case when b.device_id is not 
null then b.device_id end) as new_uv}}
 {{from}}
 {{(}}
 {{select}}
 {{event_date,}}
 {{device_id,}}
 {{qingting_id}}
 {{from datacenter.bl_page_chain_day}}
 {{where event_date = '2019-03-31'}}
 {{and (current_content like 'https://a.qingting.fm/membership5%'}}
 {{or current_content like 'https://m.qingting.fm/vips/members%'}}
 {{or current_content like 'https://sss.qingting.fm/vips/members/v2/%')}}
 {{)a}}
 {{left join}}
 {{(select}}
   b.device_id
 {{from}}
 {{lzq_test.first_buy_vip a}}
 {{inner join datacenter.device_qingting b on a.qingting_id = b.qingting_id}}
 {{where a.first_buy < '2019-03-31'}}
 {{group by b.device_id}}
 {{)b}}
 {{on a.device_id = b.device_id}}
 {{group by a.event_date;}}

plan:
{quote}Plan optimized by CBO. |

Vertex dependency in root stage 
 Map 1 <- Map 3 (BROADCAST_EDGE) 
 Reducer 2 <- Map 1 (SIMPLE_EDGE) 
 Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Reducer 2 (ONE_TO_ONE_EDGE) 
 Reducer 6 <- Reducer 5 (SIMPLE_EDGE)

Stage-0 
 Fetch Operator 
 limit:-1 
 Stage-1 
 Reducer 6 
 File Output Operator [FS_26] 
 Select Operator [SEL_25] (rows=35527639 width=349) 
 Output:["_col0","_col1","_col2","_col3"] 
 Group By Operator [GBY_24] (rows=35527639 width=349) 
 Output:["_col0","_col1","_col2"],aggregations:["count(DISTINCT 
KEY._col1:0._col0)","count(DISTINCT KEY._col1:1._col0)"],keys:KEY._col0 
 <-Reducer 5 [SIMPLE_EDGE] 
 SHUFFLE [RS_23] 
 PartitionCols:_col0 
 Group By Operator [GBY_22] (rows=71055278 width=349) 
 Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["count(DISTINCT 
_col1)","count(DISTINCT _col2)"],keys:true, _col1, _col2 
 Select Operator [SEL_20] (rows=71055278 width=349) 
 Output:["_col1","_col2"] 
 Map Join Operator [MAPJOIN_45] (rows=71055278 width=349) 
 Conds:RS_17.KEY.reducesinkkey0=RS_18.KEY.reducesinkkey0(Right 
Outer),Output:["_col0","_col1"] 
 <-Reducer 2 [ONE_TO_ONE_EDGE] 
 FORWARD [RS_17] 
 PartitionCols:_col0 
 Group By Operator [GBY_12] (rows=21738609 width=235) 
 Output:["_col0"],keys:KEY._col0 
 <-Map 1 [SIMPLE_EDGE] 
 SHUFFLE [RS_11] 
 PartitionCols:_col0 
 Group By Operator [GBY_10] (rows=43477219 width=235) 
 Output:["_col0"],keys:_col0 
 Map Join Operator [MAPJOIN_44] (rows=43477219 width=235) 
 Conds:SEL_2._col1=RS_7._col0(Inner),Output:["_col0"] 
 <-Map 3 [BROADCAST_EDGE] 
 BROADCAST [RS_7] 
 PartitionCols:_col0 
 Select Operator [SEL_5] (rows=301013 width=228) 
 Output:["_col0"] 
 Filter Operator [FIL_32] (rows=301013 width=228) 
 predicate:((first_buy < DATE'2019-03-31') and qingting_id is not null) 
 TableScan [TS_3] (rows=1062401 width=228) 
 lzq_test@first_buy_vip,a, transactional 
table,Tbl:COMPLETE,Col:NONE,Output:["qingting_id","first_buy"] 
 <-Select Operator [SEL_2] (rows=39524744 width=235) 
 Output:["_col0","_col1"] 
 Filter Operator [FIL_31] (rows=39524744 width=235) 
 predicate:qingting_id is not null 
 TableScan [TS_0] (rows=39524744 width=235) 
 datacenter@device_qingting,b, ACID 
table,Tbl:COMPLETE,Col:COMPLETE,Output:["device_id","qingting_id"] 
 <-Map 4 [CUSTOM_SIMPLE_EDGE] 
 PARTITION_ONLY_SHUFFLE [RS_18] 
 PartitionCols:_col0 
 Select Operator [SEL_16] (rows=64595706 width=349) 
 Output:["_col0"] 
 Filter Operator [FIL_33] (rows=64595706 width=349) 
 predicate:((current_content like 'https://a.qingting.fm/membership5%') or 
(current_content like 'https://m.qingting.fm/vips/members%') or 
(current_content like 'https://sss.qingting.fm/vips/members/v2/%')) 
 TableScan [TS_14] (rows=64595706 width=349) 
 
datacenter@bl_page_chain_day,bl_page_chain_day,Tbl:COMPLETE,Col:NONE,Output:["device_id","current_content"]
{quote}

  was:
when I use a table instead of the sub select,I get the right result,much more 
rows are joined together(metrics old_uv is bigger!!!) 

Is there some bugs here?

Please help me ,thanks a lot!!
{quote}select 
 {{a.event_date,}}
 {{count(distinct a.device_id) as uv,}}
 {{count(distinct case when b.device_id is not null then b.device_id end) as 
old_uv,}}
 {{count(distinct a.device_id) - count(distinct case when b.device_id is not 
null then b.device_id end) as new_uv}}
 {{from}}
 {{(}}
 {{select}}
 {{event_date,}}
 {{device_id,}}
 {{qingting_id}}
 {{from datacenter.bl_page_chain_day}}
 {{where event_date = '2019-03-31'}}
 {{and (current_content like 'https://a.qingting.fm/membership5%'}}
 {{or current_content like 'https://m.qingting.fm/vips/members%'}}
 {{or curren

[jira] [Updated] (HIVE-20968) Support conversion of managed to external where location set was not owned by hive

2019-04-04 Thread mahesh kumar behera (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-20968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-20968:
---
Description: 
As per migration rule, if a location is outside the default managed table 
directory and the location is not owned by "hive" user, then it should be 
converted to external table after upgrade.
 So, the same rule is applicable for Hive replication where the data of source 
managed table is residing outside the default warehouse directory and is not 
owned by "hive" user.
 During this conversion, the path should be preserved in target as well so that 
failover works seamlessly.
 # If the table location is out side hive warehouse and is not owned by hive, 
then the table at target will be converted to external table. But the location 
can not be retained , it will be retained relative to hive external warehouse 
directory. 
 #  As the table is not an external table at source, only those data which are 
added using events will be replicated.
 # The ownership of the location will be stored in the create table event and 
will be used to compare it with strict.managed.tables.migration.owner to decide 
if the flag in replication scope can be set. This flag is used to convert the 
managed table to external table at target.

Some of the scenarios needs to be blocked if the database is set for 
replication from a cluster with non strict managed table setting to strict 
managed table.

1. Block alter table / partition set location for database with source of 
replication set for managed tables
2. If user manually changes the ownership of the location, hive replication may 
go to a non recoverable state.
3. Block add partition if the location ownership is different than table 
location for managed tables.
4. User needs to set strict.managed.tables.migration.owner along with dump 
command (default to hive user). This value will be used during dump to decide 
the ownership which will be used during load to decide the table type. The 
location owner information can be stored in the events during create table. The 
flag can be stored in replication spec. Check other such configs used in 
upgrade tool.
5. Replication flow also set additional parameter "external.table.purge"="true" 
..only for migration to external table
6. Block conversion from managed to external and vice versa. Pass some flag in 
upgrade flow to allow this conversion during upgrade flow.

  was:
As per migration rule, if a location is outside the default managed table 
directory and the location is not owned by "hive" user, then it should be 
converted to external table after upgrade.
So, the same rule is applicable for Hive replication where the data of source 
managed table is residing outside the default warehouse directory and is not 
owned by "hive" user.
During this conversion, the path should be preserved in target as well so that 
failover works seamlessly.


> Support conversion of managed to external where location set was not owned by 
> hive
> --
>
> Key: HIVE-20968
> URL: https://issues.apache.org/jira/browse/HIVE-20968
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: DR
>
> As per migration rule, if a location is outside the default managed table 
> directory and the location is not owned by "hive" user, then it should be 
> converted to external table after upgrade.
>  So, the same rule is applicable for Hive replication where the data of 
> source managed table is residing outside the default warehouse directory and 
> is not owned by "hive" user.
>  During this conversion, the path should be preserved in target as well so 
> that failover works seamlessly.
>  # If the table location is out side hive warehouse and is not owned by hive, 
> then the table at target will be converted to external table. But the 
> location can not be retained , it will be retained relative to hive external 
> warehouse directory. 
>  #  As the table is not an external table at source, only those data which 
> are added using events will be replicated.
>  # The ownership of the location will be stored in the create table event and 
> will be used to compare it with strict.managed.tables.migration.owner to 
> decide if the flag in replication scope can be set. This flag is used to 
> convert the managed table to external table at target.
> Some of the scenarios needs to be blocked if the database is set for 
> replication from a cluster with non strict managed table setting to strict 
> managed table.
> 1. Block alter table / partition set location for database with source of 
> replication set for managed tables
> 2. If user m

[jira] [Commented] (HIVE-19875) increase LLAP IO queue size for perf

2019-04-04 Thread Karen Coppage (JIRA)



[ 
https://issues.apache.org/jira/browse/HIVE-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809604#comment-16809604
 ] 

Karen Coppage commented on HIVE-19875:
--

Hi [~prasanth_j], branch 3.0 can't compile because of this error:
 Compilation failure
 [ERROR] 
/hive/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java:[228,54]
 cannot find symbol
 [ERROR] symbol: variable MAX_DECIMAL64_PRECISION
 [ERROR] location: class org.apache.orc.TypeDescription

It seems in 3.0.x the ORC version is 1.4.3 in which 
org.apache.orc.TypeDescription.MAX_DECIMAL64_PRECISION was not yet introduced. 
ORC was upgraded from branch-3.1 (HIVE-19669), to a version where 
MAX_DECIMAL64_PRECISION exists.

> increase LLAP IO queue size for perf
> 
>
> Key: HIVE-19875
> URL: https://issues.apache.org/jira/browse/HIVE-19875
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 4.0.0
>
> Attachments: HIVE-19875.1.patch, HIVE-19875.2.patch
>
>
> According to [~gopalv] queue limit has perf impact, esp. during hashtable 
> load for mapjoin where in the past IO used to queue up more data for 
> processing to process.
> 1) Overall the default limit could be adjusted higher.
> 2) Depending on Decimal64 availability, the weight for decimal columns could 
> be reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21574) return wrong result when excuting left join sql

2019-04-04 Thread Panda Song (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panda Song updated HIVE-21574:
--
Priority: Blocker  (was: Major)

> return wrong result when excuting left join sql
> ---
>
> Key: HIVE-21574
> URL: https://issues.apache.org/jira/browse/HIVE-21574
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
> Environment: hive 3.1.0 hdfs 3.1.1
>Reporter: Panda Song
>Priority: Blocker
>
> when I use a table instead of the sub select,I get the right result,much more 
> rows are joined together(metrics old_uv is bigger!!!) 
> Is there some bugs here?
> Please help me ,thanks a lot!!
> {quote}select 
>  {{a.event_date,}}
>  {{count(distinct a.device_id) as uv,}}
>  {{count(distinct case when b.device_id is not null then b.device_id end) as 
> old_uv,}}
>  {{count(distinct a.device_id) - count(distinct case when b.device_id is not 
> null then b.device_id end) as new_uv}}
>  {{from}}
>  {{(}}
>  {{select}}
>  {{event_date,}}
>  {{device_id,}}
>  {{qingting_id}}
>  {{from datacenter.bl_page_chain_day}}
>  {{where event_date = '2019-03-31'}}
>  {{and (current_content like 'https://a.qingting.fm/membership5%'}}
>  {{or current_content like 'https://m.qingting.fm/vips/members%'}}
>  {{or current_content like 'https://sss.qingting.fm/vips/members/v2/%')}}
>  {{)a}}
>  {{left join}}
>  {{(select}}
>   b.device_id
>  {{from}}
>  {{lzq_test.first_buy_vip a}}
>  {{inner join datacenter.device_qingting b on a.qingting_id = b.qingting_id}}
>  {{where a.first_buy < '2019-03-31'}}
>  {{group by b.device_id}}
>  {{)b}}
>  {{on a.device_id = b.device_id}}
>  {{group by a.event_date;}}
> {quote}
> plan:
> {quote}Plan optimized by CBO. |
> Vertex dependency in root stage 
>  Map 1 <- Map 3 (BROADCAST_EDGE) 
>  Reducer 2 <- Map 1 (SIMPLE_EDGE) 
>  Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Reducer 2 (ONE_TO_ONE_EDGE) 
>  Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
> Stage-0 
>  Fetch Operator 
>  limit:-1 
>  Stage-1 
>  Reducer 6 
>  File Output Operator [FS_26] 
>  Select Operator [SEL_25] (rows=35527639 width=349) 
>  Output:["_col0","_col1","_col2","_col3"] 
>  Group By Operator [GBY_24] (rows=35527639 width=349) 
>  Output:["_col0","_col1","_col2"],aggregations:["count(DISTINCT 
> KEY._col1:0._col0)","count(DISTINCT KEY._col1:1._col0)"],keys:KEY._col0 
>  <-Reducer 5 [SIMPLE_EDGE] 
>  SHUFFLE [RS_23] 
>  PartitionCols:_col0 
>  Group By Operator [GBY_22] (rows=71055278 width=349) 
>  
> Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["count(DISTINCT
>  _col1)","count(DISTINCT _col2)"],keys:true, _col1, _col2 
>  Select Operator [SEL_20] (rows=71055278 width=349) 
>  Output:["_col1","_col2"] 
>  Map Join Operator [MAPJOIN_45] (rows=71055278 width=349) 
>  Conds:RS_17.KEY.reducesinkkey0=RS_18.KEY.reducesinkkey0(Right 
> Outer),Output:["_col0","_col1"] 
>  <-Reducer 2 [ONE_TO_ONE_EDGE] 
>  FORWARD [RS_17] 
>  PartitionCols:_col0 
>  Group By Operator [GBY_12] (rows=21738609 width=235) 
>  Output:["_col0"],keys:KEY._col0 
>  <-Map 1 [SIMPLE_EDGE] 
>  SHUFFLE [RS_11] 
>  PartitionCols:_col0 
>  Group By Operator [GBY_10] (rows=43477219 width=235) 
>  Output:["_col0"],keys:_col0 
>  Map Join Operator [MAPJOIN_44] (rows=43477219 width=235) 
>  Conds:SEL_2._col1=RS_7._col0(Inner),Output:["_col0"] 
>  <-Map 3 [BROADCAST_EDGE] 
>  BROADCAST [RS_7] 
>  PartitionCols:_col0 
>  Select Operator [SEL_5] (rows=301013 width=228) 
>  Output:["_col0"] 
>  Filter Operator [FIL_32] (rows=301013 width=228) 
>  predicate:((first_buy < DATE'2019-03-31') and qingting_id is not null) 
>  TableScan [TS_3] (rows=1062401 width=228) 
>  lzq_test@first_buy_vip,a, transactional 
> table,Tbl:COMPLETE,Col:NONE,Output:["qingting_id","first_buy"] 
>  <-Select Operator [SEL_2] (rows=39524744 width=235) 
>  Output:["_col0","_col1"] 
>  Filter Operator [FIL_31] (rows=39524744 width=235) 
>  predicate:qingting_id is not null 
>  TableScan [TS_0] (rows=39524744 width=235) 
>  datacenter@device_qingting,b, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["device_id","qingting_id"] 
>  <-Map 4 [CUSTOM_SIMPLE_EDGE] 
>  PARTITION_ONLY_SHUFFLE [RS_18] 
>  PartitionCols:_col0 
>  Select Operator [SEL_16] (rows=64595706 width=349) 
>  Output:["_col0"] 
>  Filter Operator [FIL_33] (rows=64595706 width=349) 
>  predicate:((current_content like 'https://a.qingting.fm/membership5%') or 
> (current_content like 'https://m.qingting.fm/vips/members%') or 
> (current_content like 'https://sss.qingting.fm/vips/members/v2/%')) 
>  TableScan [TS_14] (rows=64595706 width=349) 
>  
> datacenter@bl_page_chain_day,bl_page_chain_day,Tbl:COMPLETE,Col:NONE,Output:["device_id","current_content"]
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21574) return wrong result when excuting left join sql

2019-04-04 Thread Panda Song (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panda Song updated HIVE-21574:
--
Description: 
when I use a table instead of the sub select,I get the right result,much more 
rows are joined together(metrics old_uv is bigger!!!) 

Is there some bugs here?

Please help me ,thanks a lot!!
{quote}select 
 {{a.event_date,}}
 {{count(distinct a.device_id) as uv,}}
 {{count(distinct case when b.device_id is not null then b.device_id end) as 
old_uv,}}
 {{count(distinct a.device_id) - count(distinct case when b.device_id is not 
null then b.device_id end) as new_uv}}
 {{from}}
 {{(}}
 {{select}}
 {{event_date,}}
 {{device_id,}}
 {{qingting_id}}
 {{from datacenter.bl_page_chain_day}}
 {{where event_date = '2019-03-31'}}
 {{and (current_content like 'https://a.qingting.fm/membership5%'}}
 {{or current_content like 'https://m.qingting.fm/vips/members%'}}
 {{or current_content like 'https://sss.qingting.fm/vips/members/v2/%')}}
 {{)a}}
 {{left join}}
 {{(select}}
  b.device_id
 {{from}}
 {{lzq_test.first_buy_vip a}}
 {{inner join datacenter.device_qingting b on a.qingting_id = b.qingting_id}}
 {{where a.first_buy < '2019-03-31'}}
 {{group by b.device_id}}
 {{)b}}
 {{on a.device_id = b.device_id}}
 {{group by a.event_date;}}
{quote}
plan:
{quote}Plan optimized by CBO. |

Vertex dependency in root stage 
 Map 1 <- Map 3 (BROADCAST_EDGE) 
 Reducer 2 <- Map 1 (SIMPLE_EDGE) 
 Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Reducer 2 (ONE_TO_ONE_EDGE) 
 Reducer 6 <- Reducer 5 (SIMPLE_EDGE)

Stage-0 
 Fetch Operator 
 limit:-1 
 Stage-1 
 Reducer 6 
 File Output Operator [FS_26] 
 Select Operator [SEL_25] (rows=35527639 width=349) 
 Output:["_col0","_col1","_col2","_col3"] 
 Group By Operator [GBY_24] (rows=35527639 width=349) 
 Output:["_col0","_col1","_col2"],aggregations:["count(DISTINCT 
KEY._col1:0._col0)","count(DISTINCT KEY._col1:1._col0)"],keys:KEY._col0 
 <-Reducer 5 [SIMPLE_EDGE] 
 SHUFFLE [RS_23] 
 PartitionCols:_col0 
 Group By Operator [GBY_22] (rows=71055278 width=349) 
 Output:["_col0","_col1","_col2","_col3","_col4"],aggregations:["count(DISTINCT 
_col1)","count(DISTINCT _col2)"],keys:true, _col1, _col2 
 Select Operator [SEL_20] (rows=71055278 width=349) 
 Output:["_col1","_col2"] 
 Map Join Operator [MAPJOIN_45] (rows=71055278 width=349) 
 Conds:RS_17.KEY.reducesinkkey0=RS_18.KEY.reducesinkkey0(Right 
Outer),Output:["_col0","_col1"] 
 <-Reducer 2 [ONE_TO_ONE_EDGE] 
 FORWARD [RS_17] 
 PartitionCols:_col0 
 Group By Operator [GBY_12] (rows=21738609 width=235) 
 Output:["_col0"],keys:KEY._col0 
 <-Map 1 [SIMPLE_EDGE] 
 SHUFFLE [RS_11] 
 PartitionCols:_col0 
 Group By Operator [GBY_10] (rows=43477219 width=235) 
 Output:["_col0"],keys:_col0 
 Map Join Operator [MAPJOIN_44] (rows=43477219 width=235) 
 Conds:SEL_2._col1=RS_7._col0(Inner),Output:["_col0"] 
 <-Map 3 [BROADCAST_EDGE] 
 BROADCAST [RS_7] 
 PartitionCols:_col0 
 Select Operator [SEL_5] (rows=301013 width=228) 
 Output:["_col0"] 
 Filter Operator [FIL_32] (rows=301013 width=228) 
 predicate:((first_buy < DATE'2019-03-31') and qingting_id is not null) 
 TableScan [TS_3] (rows=1062401 width=228) 
 lzq_test@first_buy_vip,a, transactional 
table,Tbl:COMPLETE,Col:NONE,Output:["qingting_id","first_buy"] 
 <-Select Operator [SEL_2] (rows=39524744 width=235) 
 Output:["_col0","_col1"] 
 Filter Operator [FIL_31] (rows=39524744 width=235) 
 predicate:qingting_id is not null 
 TableScan [TS_0] (rows=39524744 width=235) 
 datacenter@device_qingting,b, ACID 
table,Tbl:COMPLETE,Col:COMPLETE,Output:["device_id","qingting_id"] 
 <-Map 4 [CUSTOM_SIMPLE_EDGE] 
 PARTITION_ONLY_SHUFFLE [RS_18] 
 PartitionCols:_col0 
 Select Operator [SEL_16] (rows=64595706 width=349) 
 Output:["_col0"] 
 Filter Operator [FIL_33] (rows=64595706 width=349) 
 predicate:((current_content like 'https://a.qingting.fm/membership5%') or 
(current_content like 'https://m.qingting.fm/vips/members%') or 
(current_content like 'https://sss.qingting.fm/vips/members/v2/%')) 
 TableScan [TS_14] (rows=64595706 width=349) 
 
datacenter@bl_page_chain_day,bl_page_chain_day,Tbl:COMPLETE,Col:NONE,Output:["device_id","current_content"]
{quote}

  was:
when I use a table instead of the sub select,I get the right result,much more 
rows are joined together(metrics old_uv is bigger!!!) 

Is there some bugs here?

Please help me ,thanks a lot!!
{quote}{{select }}
{{a.event_date,}}
{{count(distinct a.device_id) as uv,}}
{{count(distinct case when b.device_id is not null then b.device_id end) as 
old_uv,}}
{{count(distinct a.device_id) - count(distinct case when b.device_id is not 
null then b.device_id end) as new_uv}}
{{from}}
{{(}}
{{select}}
{{event_date,}}
{{device_id,}}
{{qingting_id}}
{{from datacenter.bl_page_chain_day}}
{{where event_date = '2019-03-31'}}
{{and (current_content like 'https://a.qingting.fm/membership5%'}}
{{or current_content like 'https://m.qingting.fm/vips/members%'}}
{{or curre

[jira] [Updated] (HIVE-21529) Hive support bootstrap of ACID/MM tables on an existing policy.

2019-04-04 Thread Ashutosh Bapat (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Bapat updated HIVE-21529:
--
Attachment: HIVE-21529.02.patch
Status: Patch Available  (was: Open)

> Hive support bootstrap of ACID/MM tables on an existing policy.
> ---
>
> Key: HIVE-21529
> URL: https://issues.apache.org/jira/browse/HIVE-21529
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl, Transactions
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Ashutosh Bapat
>Priority: Major
>  Labels: DR, pull-request-available, replication
> Attachments: HIVE-21529.01.patch, HIVE-21529.02.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> If ACID/MM tables to be enabled (hive.repl.dump.include.acid.tables) on an 
> existing repl policy, then need to combine bootstrap dump of these tables 
> along with the ongoing incremental dump. 
>  Shall add a one time config "hive.repl.bootstrap.acid.tables" to include 
> bootstrap in the given dump.
> The support for hive.repl.bootstrap.cleanup.type for ACID tables to clean-up 
> partially bootstrapped tables in case of retry is already in place, thanks to 
> the work done during external tables. Need to test that it actually works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21404) MSSQL upgrade script alters the wrong column

2019-04-04 Thread David Lavati (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21404:

Status: In Progress  (was: Patch Available)

> MSSQL upgrade script alters the wrong column
> 
>
> Key: HIVE-21404
> URL: https://issues.apache.org/jira/browse/HIVE-21404
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.2.0
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
> Attachments: HIVE-21404.1.patch, HIVE-21404.2.patch, 
> HIVE-21404.3.patch, HIVE-21404.4.patch, HIVE-21404.4.patch, 
> HIVE-21404.4.patch, HIVE-21404.4.patch, HIVE-21404.4.patch, 
> HIVE-21404.4.patch, HIVE-21404.4.patch, HIVE-21404.4.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-20221 changes PARTITION_PARAMS, so the following command is modifying 
> the wrong table:
> {{ALTER TABLE "SERDE_PARAMS" ALTER COLUMN "PARAM_VALUE" nvarchar(MAX);}}
> https://github.com/apache/hive/blob/d3b036920acde7bb04840697eb13038103b062b4/standalone-metastore/metastore-server/src/main/sql/mssql/upgrade-3.1.0-to-3.2.0.mssql.sql#L21



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21404) MSSQL upgrade script alters the wrong column

2019-04-04 Thread David Lavati (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Lavati updated HIVE-21404:

Attachment: HIVE-21404.4.patch
Status: Patch Available  (was: In Progress)

> MSSQL upgrade script alters the wrong column
> 
>
> Key: HIVE-21404
> URL: https://issues.apache.org/jira/browse/HIVE-21404
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.2.0
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
> Attachments: HIVE-21404.1.patch, HIVE-21404.2.patch, 
> HIVE-21404.3.patch, HIVE-21404.4.patch, HIVE-21404.4.patch, 
> HIVE-21404.4.patch, HIVE-21404.4.patch, HIVE-21404.4.patch, 
> HIVE-21404.4.patch, HIVE-21404.4.patch, HIVE-21404.4.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-20221 changes PARTITION_PARAMS, so the following command is modifying 
> the wrong table:
> {{ALTER TABLE "SERDE_PARAMS" ALTER COLUMN "PARAM_VALUE" nvarchar(MAX);}}
> https://github.com/apache/hive/blob/d3b036920acde7bb04840697eb13038103b062b4/standalone-metastore/metastore-server/src/main/sql/mssql/upgrade-3.1.0-to-3.2.0.mssql.sql#L21



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21567) Break up DDLTask - extract Function related operations

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21567:
--
Status: Patch Available  (was: Open)

> Break up DDLTask - extract Function related operations
> --
>
> Key: HIVE-21567
> URL: https://issues.apache.org/jira/browse/HIVE-21567
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21567.01.patch, HIVE-21567.02.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #4: extract all the function related operations from the old DDLTask, 
> and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21567) Break up DDLTask - extract Function related operations

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21567:
--
Attachment: HIVE-21567.02.patch

> Break up DDLTask - extract Function related operations
> --
>
> Key: HIVE-21567
> URL: https://issues.apache.org/jira/browse/HIVE-21567
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21567.01.patch, HIVE-21567.02.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #4: extract all the function related operations from the old DDLTask, 
> and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21567) Break up DDLTask - extract Function related operations

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21567:
--
Status: Open  (was: Patch Available)

> Break up DDLTask - extract Function related operations
> --
>
> Key: HIVE-21567
> URL: https://issues.apache.org/jira/browse/HIVE-21567
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21567.01.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #4: extract all the function related operations from the old DDLTask, 
> and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21231:
--
Attachment: HIVE-21231.10.patch

> HiveJoinAddNotNullRule support for range predicates
> ---
>
> Key: HIVE-21231
> URL: https://issues.apache.org/jira/browse/HIVE-21231
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-21231.01.patch, HIVE-21231.02.patch, 
> HIVE-21231.03.patch, HIVE-21231.04.patch, HIVE-21231.05.patch, 
> HIVE-21231.06.patch, HIVE-21231.07.patch, HIVE-21231.08.patch, 
> HIVE-21231.09.patch, HIVE-21231.10.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For instance, given the following query:
> {code:sql}
> SELECT t0.col0, t0.col1
> FROM
>   (
> SELECT col0, col1 FROM tab
>   ) AS t0
>   INNER JOIN
>   (
> SELECT col0, col1 FROM tab
>   ) AS t1
> ON t0.col0 < t1.col0 AND t0.col1 > t1.col1
> {code}
> we could still infer that col0 and col1 cannot be null for any of the inputs. 
> Currently we do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21231:
--
Status: Open  (was: Patch Available)

> HiveJoinAddNotNullRule support for range predicates
> ---
>
> Key: HIVE-21231
> URL: https://issues.apache.org/jira/browse/HIVE-21231
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-21231.01.patch, HIVE-21231.02.patch, 
> HIVE-21231.03.patch, HIVE-21231.04.patch, HIVE-21231.05.patch, 
> HIVE-21231.06.patch, HIVE-21231.07.patch, HIVE-21231.08.patch, 
> HIVE-21231.09.patch, HIVE-21231.10.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For instance, given the following query:
> {code:sql}
> SELECT t0.col0, t0.col1
> FROM
>   (
> SELECT col0, col1 FROM tab
>   ) AS t0
>   INNER JOIN
>   (
> SELECT col0, col1 FROM tab
>   ) AS t1
> ON t0.col0 < t1.col0 AND t0.col1 > t1.col1
> {code}
> we could still infer that col0 and col1 cannot be null for any of the inputs. 
> Currently we do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-21231) HiveJoinAddNotNullRule support for range predicates

2019-04-04 Thread Miklos Gergely (JIRA)



 [ 
https://issues.apache.org/jira/browse/HIVE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21231:
--
Status: Patch Available  (was: Open)

> HiveJoinAddNotNullRule support for range predicates
> ---
>
> Key: HIVE-21231
> URL: https://issues.apache.org/jira/browse/HIVE-21231
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: newbie, pull-request-available
> Attachments: HIVE-21231.01.patch, HIVE-21231.02.patch, 
> HIVE-21231.03.patch, HIVE-21231.04.patch, HIVE-21231.05.patch, 
> HIVE-21231.06.patch, HIVE-21231.07.patch, HIVE-21231.08.patch, 
> HIVE-21231.09.patch, HIVE-21231.10.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For instance, given the following query:
> {code:sql}
> SELECT t0.col0, t0.col1
> FROM
>   (
> SELECT col0, col1 FROM tab
>   ) AS t0
>   INNER JOIN
>   (
> SELECT col0, col1 FROM tab
>   ) AS t1
> ON t0.col0 < t1.col0 AND t0.col1 > t1.col1
> {code}
> we could still infer that col0 and col1 cannot be null for any of the inputs. 
> Currently we do not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

83 matches

Mail list logo