[jira] [Assigned] (HIVE-24435) Vectorized unix_timestamp is inconsistent with non-vectorized counterpart

2020-11-26 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24435:
---


> Vectorized unix_timestamp is inconsistent with non-vectorized counterpart
> -
>
> Key: HIVE-24435
> URL: https://issues.apache.org/jira/browse/HIVE-24435
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> {code}
> create table t (d string);
> insert into t values('2020-11-16 22:18:40 UTC');
> select
>   '>' || d || '<' , unix_timestamp(d), from_unixtime(unix_timestamp(d)), 
> to_date(from_unixtime(unix_timestamp(d)))
> from t
> ;
> set hive.fetch.task.conversion=none;
> select
>   '>' || d || '<' , unix_timestamp(d), from_unixtime(unix_timestamp(d)), 
> to_date(from_unixtime(unix_timestamp(d)))
> from t
> ;
> {code}
> results:
> {code}
> -- std udf:
> >2020-11-16 22:18:40 UTC<   1605593920  2020-11-16 22:18:40 
> >2020-11-16
> -- vectorized udf
> >2020-11-16 22:18:40 UTC<   NULLNULLNULL
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24428) Concurrent add_partitions requests may lead to data loss

2020-11-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24428:
---


> Concurrent add_partitions requests may lead to data loss
> 
>
> Key: HIVE-24428
> URL: https://issues.apache.org/jira/browse/HIVE-24428
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> in case multiple clients are adding partitions to the same table - when the 
> same partition is being added there is a chance that the data dir is removed 
> after the other client have already written its data
> https://github.com/apache/hive/blob/5e96b14a2357c66a0640254d5414bc706d8be852/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3958



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-23965:
---

Assignee: Stamatis Zampetakis  (was: Zoltan Haindrich)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-23965:

Attachment: master355.tgz

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reopened HIVE-23965:
-

I've reverted the patch for now because it have exposed some issue with our 
test environment (master builds have stuck)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-11-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-23965:
---

Assignee: Zoltan Haindrich  (was: Stamatis Zampetakis)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22087) HMS Translation: Translate getDatabase() API to alter warehouse location

2020-11-20 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236245#comment-17236245
 ] 

Zoltan Haindrich commented on HIVE-22087:
-

[~ngangam],[~ngangam]: this patch have removed the `firePreEvent` call from the 
`get_database` method - was that intentional?

https://github.com/apache/hive/commit/3934de09cd0a72e20a091a8bfa06e80b76ac197c#diff-00e70b6958060aa36762b21bf16676f83af01c1e09b56816aecc6abe7c8ac866L1518

> HMS Translation: Translate getDatabase() API to alter warehouse location
> 
>
> Key: HIVE-22087
> URL: https://issues.apache.org/jira/browse/HIVE-22087
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22087.1.patch, HIVE-22087.2.patch, 
> HIVE-22087.3.patch, HIVE-22087.5.patch, HIVE-22087.6.patch, HIVE-22087.7.patch
>
>
> It makes sense to translate getDatabase() calls as well, to alter the 
> location for the Database based on whether or not the processor has 
> capabilities to write to the managed warehouse directory. Every DB has 2 
> locations, one external and the other in the managed warehouse directory. If 
> the processor has any AcidWrite capability, then the location remains 
> unchanged for the database.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24167) TPC-DS query 14 fails while generating plan for the filter

2020-11-18 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234870#comment-17234870
 ] 

Zoltan Haindrich commented on HIVE-24167:
-

I'll fix this - its most likely connected to some new operator which is not 
propely linked in the earlier phases of planning

> TPC-DS query 14 fails while generating plan for the filter
> --
>
> Key: HIVE-24167
> URL: https://issues.apache.org/jira/browse/HIVE-24167
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Zoltan Haindrich
>Priority: Major
>
> TPC-DS query 14 (cbo_query14.q and query4.q) fail with NPE on the metastore 
> with the partitioned TPC-DS 30TB dataset while generating the plan for the 
> filter.
> The problem can be reproduced using the PR in HIVE-23965.
> The current stacktrace shows that the NPE appears while trying to display the 
> debug message but even if this line didn't exist it would fail again later on.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10867)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3375)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3473)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10819)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12417)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12519)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
> at 
> 

[jira] [Assigned] (HIVE-24167) TPC-DS query 14 fails while generating plan for the filter

2020-11-18 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24167:
---

Assignee: Zoltan Haindrich

> TPC-DS query 14 fails while generating plan for the filter
> --
>
> Key: HIVE-24167
> URL: https://issues.apache.org/jira/browse/HIVE-24167
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Zoltan Haindrich
>Priority: Major
>
> TPC-DS query 14 (cbo_query14.q and query4.q) fail with NPE on the metastore 
> with the partitioned TPC-DS 30TB dataset while generating the plan for the 
> filter.
> The problem can be reproduced using the PR in HIVE-23965.
> The current stacktrace shows that the NPE appears while trying to display the 
> debug message but even if this line didn't exist it would fail again later on.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10867)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3375)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3473)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10819)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12417)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12519)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:740)
> at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:710)
> at 

[jira] [Commented] (HIVE-21843) UNION query with regular expressions for column name does not work

2020-11-17 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233739#comment-17233739
 ] 

Zoltan Haindrich commented on HIVE-21843:
-

[~osayankin]: I've rebased your patch and opened a pr to run tests

> UNION query with regular expressions for column name does not work
> --
>
> Key: HIVE-21843
> URL: https://issues.apache.org/jira/browse/HIVE-21843
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HIVE-21843.1.patch, HIVE-21843.10.patch, 
> HIVE-21843.4.patch, HIVE-21843.6.patch, HIVE-21843.7.patch, 
> HIVE-21843.8.patch, HIVE-21843.9.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *STEPS TO REPRODUCE:*
> 1. Create a table:
> {code:java}CREATE TABLE t (a1 INT, a2 INT);
> INSERT INTO TABLE t VALUES (1,1),(1,2),(2,1),(2,2);{code}
> 2. SET hive.support.quoted.identifiers to "none":
> {code:java}SET hive.support.quoted.identifiers=none;{code}
> 3. Run the query:
> {code:java}SELECT `(a1)?+.+` FROM t
> UNION
> SELECT `(a2)?+.+` FROM t;{code}
> *ACTUAL RESULT:*
> The query fails with an exception:
> {code:java}2019-05-23T01:36:53,604 ERROR 
> [9aa457a9-1c74-466e-abef-ec2f007117f3 main] ql.Driver: FAILED: 
> SemanticException Line 0:-1 Invalid column reference '`(a1)?+.+`'
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
> reference '`(a1)?+.+`'
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11734)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11674)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11642)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11620)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:5225)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:6330)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9659)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10579)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10457)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:11202)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:481)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11215)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:774)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:692)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136){code}
> FYI: [~sershe]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24388) Enhance swo optimizations to merge EventOperators

2020-11-16 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24388:
---


> Enhance swo optimizations to merge EventOperators
> -
>
> Key: HIVE-24388
> URL: https://issues.apache.org/jira/browse/HIVE-24388
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> {code}
> EVENT1->TS1
> EVENT2->TS2
> {code}
> are not merged because a TS may only handles the first event properly; 
> sending 2 events would cause one of them to be ignored



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24269) In SharedWorkOptimizer run simplification after merging TS filter expressions

2020-11-13 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231358#comment-17231358
 ] 

Zoltan Haindrich commented on HIVE-24269:
-

HIVE-24365 have reduce redundant expression creation a lot - this might not be 
needed

> In SharedWorkOptimizer run simplification after merging TS filter expressions
> -
>
> Key: HIVE-24269
> URL: https://issues.apache.org/jira/browse/HIVE-24269
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> https://github.com/apache/hive/pull/1553#discussion_r503837757



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24241.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24360) SharedWorkOptimizer may create incorrect plans with DPPUnion

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24360:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Bug)

> SharedWorkOptimizer may create incorrect plans with DPPUnion
> 
>
> Key: HIVE-24360
> URL: https://issues.apache.org/jira/browse/HIVE-24360
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24295) Apply schema merge to all shared work optimizations

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24295:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Apply schema merge to all shared work optimizations
> ---
>
> Key: HIVE-24295
> URL: https://issues.apache.org/jira/browse/HIVE-24295
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24382) Organize replaceTabAlias methods

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24382:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Organize replaceTabAlias methods
> 
>
> Key: HIVE-24382
> URL: https://issues.apache.org/jira/browse/HIVE-24382
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> * move to the OperatorDesc / etc
> https://github.com/apache/hive/pull/1661#discussion_r522693729



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24357) Exchange SWO table/algorithm strategy

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24357:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Exchange SWO table/algorithm strategy
> -
>
> Key: HIVE-24357
> URL: https://issues.apache.org/jira/browse/HIVE-24357
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: swo.before.jointree.dot.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> SWO right now runs like: 
> {code}
> for every strategy s: for every table t: try s for t
> {code}
> this results in that an earlier startegy may create a more entangled operator 
> tree behind - in case its able to merge for a less prioritized table
> it would probably make more sense to do:
> {code}
> for every table t: for every strategy s: try s for t
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24269) In SharedWorkOptimizer run simplification after merging TS filter expressions

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24269:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> In SharedWorkOptimizer run simplification after merging TS filter expressions
> -
>
> Key: HIVE-24269
> URL: https://issues.apache.org/jira/browse/HIVE-24269
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> https://github.com/apache/hive/pull/1553#discussion_r503837757



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24242:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Relax safety checks in SharedWorkOptimizer
> --
>
> Key: HIVE-24242
> URL: https://issues.apache.org/jira/browse/HIVE-24242
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> there are some checks to lock out problematic cases
> For UnionOperator 
> [here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571]
> This check could prevent the optimization even if the Union is only visible 
> from only 1 of the TS ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24241:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24355) Implement hashCode and equals for Partition

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24355:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Bug)

> Implement hashCode and equals for Partition 
> 
>
> Key: HIVE-24355
> URL: https://issues.apache.org/jira/browse/HIVE-24355
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> this might cause some issues - it also prevents the SWO from merging TS 
> operators which have partitions in the "pruned list"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24365:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.
> consider 3 scans with filters: (A,B,C)
> initially we have
> {code} 
> T(A)
> T(B)
> T(C)
> {code}
> after merging A,B
> {code}
> T(A || B) -> FIL(A)
>   -> FIL(B)
> T(C)
> {code}
> right now if we merge C as well:
> {code}
> T(A || B || C) -> FIL(A AND (A || B))
>-> FIL(B AND (A || B))
>-> FIL(C)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24231:

Parent: HIVE-24384
Issue Type: Sub-task  (was: Improvement)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24384) SharedWorkOptimizer improvements

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24384:
---


> SharedWorkOptimizer improvements
> 
>
> Key: HIVE-24384
> URL: https://issues.apache.org/jira/browse/HIVE-24384
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this started as a small feature addition but due to the sheer volume of the 
> q.out changes - its better to do smaller changes at a time; which means more 
> tickets...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24357) Exchange SWO table/algorithm strategy

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24357.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> Exchange SWO table/algorithm strategy
> -
>
> Key: HIVE-24357
> URL: https://issues.apache.org/jira/browse/HIVE-24357
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: swo.before.jointree.dot.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> SWO right now runs like: 
> {code}
> for every strategy s: for every table t: try s for t
> {code}
> this results in that an earlier startegy may create a more entangled operator 
> tree behind - in case its able to merge for a less prioritized table
> it would probably make more sense to do:
> {code}
> for every table t: for every strategy s: try s for t
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24365.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.
> consider 3 scans with filters: (A,B,C)
> initially we have
> {code} 
> T(A)
> T(B)
> T(C)
> {code}
> after merging A,B
> {code}
> T(A || B) -> FIL(A)
>   -> FIL(B)
> T(C)
> {code}
> right now if we merge C as well:
> {code}
> T(A || B || C) -> FIL(A AND (A || B))
>-> FIL(B AND (A || B))
>-> FIL(C)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24382) Organize replaceTabAlias methods

2020-11-12 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24382:
---


> Organize replaceTabAlias methods
> 
>
> Key: HIVE-24382
> URL: https://issues.apache.org/jira/browse/HIVE-24382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> * move to the OperatorDesc / etc
> https://github.com/apache/hive/pull/1661#discussion_r522693729



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-12 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24365:

Issue Type: Improvement  (was: Bug)

> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.
> consider 3 scans with filters: (A,B,C)
> initially we have
> {code} 
> T(A)
> T(B)
> T(C)
> {code}
> after merging A,B
> {code}
> T(A || B) -> FIL(A)
>   -> FIL(B)
> T(C)
> {code}
> right now if we merge C as well:
> {code}
> T(A || B || C) -> FIL(A AND (A || B))
>-> FIL(B AND (A || B))
>-> FIL(C)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-10 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229364#comment-17229364
 ] 

Zoltan Haindrich commented on HIVE-24365:
-


* in case 2 TS is merged by SWO ; the FIL expression is corrected to filter for 
only that branch
* note that any further changes to the above created FIL expression may not 
change the results



> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-10 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24365:

Description: 
for q88 we have complex and mostly unreadable filter expressions; because 
before merging 2 branches the TS filterexpression is pushed into a FIL operator.

consider 3 scans with filters: (A,B,C)
initially we have
{code} 
T(A)
T(B)
T(C)
{code}

after merging A,B
{code}
T(A || B) -> FIL(A)
  -> FIL(B)
T(C)
{code}

right now if we merge C as well:
{code}
T(A || B || C) -> FIL(A AND (A || B))
   -> FIL(B AND (A || B))
   -> FIL(C)
{code}


  was:
for q88 we have complex and mostly unreadable filter expressions; because 
before merging 2 branches the TS filterexpression is pushed into a FIL operator.




> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.
> consider 3 scans with filters: (A,B,C)
> initially we have
> {code} 
> T(A)
> T(B)
> T(C)
> {code}
> after merging A,B
> {code}
> T(A || B) -> FIL(A)
>   -> FIL(B)
> T(C)
> {code}
> right now if we merge C as well:
> {code}
> T(A || B || C) -> FIL(A AND (A || B))
>-> FIL(B AND (A || B))
>-> FIL(C)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24365) SWO should not create complex and redundant filter expressions

2020-11-10 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24365:
---


> SWO should not create complex and redundant filter expressions
> --
>
> Key: HIVE-24365
> URL: https://issues.apache.org/jira/browse/HIVE-24365
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> for q88 we have complex and mostly unreadable filter expressions; because 
> before merging 2 branches the TS filterexpression is pushed into a FIL 
> operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24360) SharedWorkOptimizer may create incorrect plans with DPPUnion

2020-11-09 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24360.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> SharedWorkOptimizer may create incorrect plans with DPPUnion
> 
>
> Key: HIVE-24360
> URL: https://issues.apache.org/jira/browse/HIVE-24360
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24355) Implement hashCode and equals for Partition

2020-11-08 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227951#comment-17227951
 ] 

Zoltan Haindrich commented on HIVE-24355:
-

merged into master. Thank you Miklos for reviewing the changes!

> Implement hashCode and equals for Partition 
> 
>
> Key: HIVE-24355
> URL: https://issues.apache.org/jira/browse/HIVE-24355
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> this might cause some issues - it also prevents the SWO from merging TS 
> operators which have partitions in the "pruned list"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24355) Implement hashCode and equals for Partition

2020-11-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24355.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Implement hashCode and equals for Partition 
> 
>
> Key: HIVE-24355
> URL: https://issues.apache.org/jira/browse/HIVE-24355
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> this might cause some issues - it also prevents the SWO from merging TS 
> operators which have partitions in the "pruned list"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24360) SharedWorkOptimizer may create incorrect plans with DPPUnion

2020-11-05 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226577#comment-17226577
 ] 

Zoltan Haindrich commented on HIVE-24360:
-

unfortunately without this the plan changes are incorrect as well - so it's 
harder to see improvements

> SharedWorkOptimizer may create incorrect plans with DPPUnion
> 
>
> Key: HIVE-24360
> URL: https://issues.apache.org/jira/browse/HIVE-24360
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24360) SharedWorkOptimizer may create incorrect plans with DPPUnion

2020-11-05 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24360:
---


> SharedWorkOptimizer may create incorrect plans with DPPUnion
> 
>
> Key: HIVE-24360
> URL: https://issues.apache.org/jira/browse/HIVE-24360
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24357) Exchange SWO table/algorithm strategy

2020-11-04 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24357:

Attachment: swo.before.jointree.dot.png

> Exchange SWO table/algorithm strategy
> -
>
> Key: HIVE-24357
> URL: https://issues.apache.org/jira/browse/HIVE-24357
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: swo.before.jointree.dot.png
>
>
> SWO right now runs like: 
> {code}
> for every strategy s: for every table t: try s for t
> {code}
> this results in that an earlier startegy may create a more entangled operator 
> tree behind - in case its able to merge for a less prioritized table
> it would probably make more sense to do:
> {code}
> for every table t: for every strategy s: try s for t
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24355) Implement hashCode and equals for Partition

2020-11-04 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24355:

Summary: Implement hashCode and equals for Partition   (was: Partition 
doesn't have hashCode/equals)

> Implement hashCode and equals for Partition 
> 
>
> Key: HIVE-24355
> URL: https://issues.apache.org/jira/browse/HIVE-24355
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> this might cause some issues - it also prevents the SWO from merging TS 
> operators which have partitions in the "pruned list"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24357) Exchange SWO table/algorithm strategy

2020-11-04 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24357:
---

Assignee: Zoltan Haindrich

> Exchange SWO table/algorithm strategy
> -
>
> Key: HIVE-24357
> URL: https://issues.apache.org/jira/browse/HIVE-24357
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> SWO right now runs like: 
> {code}
> for every strategy s: for every table t: try s for t
> {code}
> this results in that an earlier startegy may create a more entangled operator 
> tree behind - in case its able to merge for a less prioritized table
> it would probably make more sense to do:
> {code}
> for every table t: for every strategy s: try s for t
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24355) Partition doesn't have hashCode/equals

2020-11-04 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24355:
---


> Partition doesn't have hashCode/equals
> --
>
> Key: HIVE-24355
> URL: https://issues.apache.org/jira/browse/HIVE-24355
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> this might cause some issues - it also prevents the SWO from merging TS 
> operators which have partitions in the "pruned list"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22307) Upgrade Hadoop version to 3.1.3

2020-11-02 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224718#comment-17224718
 ] 

Zoltan Haindrich commented on HIVE-22307:
-

I think this is caused by the guava version bump - you might probably need to 
mark the method nullable
https://stackoverflow.com/questions/12422473/nullable-input-in-google-guava-function-interface-triggers-findbugs-warning

> Upgrade Hadoop version to 3.1.3
> ---
>
> Key: HIVE-22307
> URL: https://issues.apache.org/jira/browse/HIVE-22307
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Lavati
>Assignee: David Lavati
>Priority: Major
> Attachments: HIVE-22307.patch, HIVE-22307.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-10-29 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24217.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Attila Magyar!

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24253) HMS and HS2 needs to support keystore/truststores types besides JKS by config

2020-10-28 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1730#comment-1730
 ] 

Zoltan Haindrich commented on HIVE-24253:
-

[~ychena]   the PR seems to be merged around a week ago - I think we should 
close this ticket.

but...I was coming here for a different reason: it seems to me that you've made 
some improvements to the `TestSSL` test ; do you think we can remove the Ignore 
from it?
(some testcases from TestSSL were  frequent guests in unrelated testruns - so 
it was marked as ignored)
https://github.com/apache/hive/blob/375433510b73c5a22bde4e13485dfc16eaa24706/itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestSSL.java#L56

> HMS and HS2 needs to support keystore/truststores types besides JKS by config
> -
>
> Key: HIVE-24253
> URL: https://issues.apache.org/jira/browse/HIVE-24253
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When HiveMetaStoreClient connects to HMS with enabled SSL, HMS should support 
> the Keystore type configurable and default to keystore type specified for the 
> JDK and not always use JKS. Same as HIVE-23958 for hive, HMS should support 
> to set additional keystore/truststore types used for different applications 
> like for FIPS crypto algorithms.
> Also, make hive keystore type and algorithm configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24320) TestMiniLlapLocal sometimes hangs because of some derby issues

2020-10-28 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17222124#comment-17222124
 ] 

Zoltan Haindrich commented on HIVE-24320:
-

attached last 1000 lines of hive log  and full jstack trace 

there seems to be some derby issues prior to this state - not sure if those are 
only caused by the issue or they are part of the root cause

issues seems to start with:
{code}
2020-10-28T01:24:33,767  WARN [Heartbeater-3] pool.ProxyConnection: 
HikariPool-3 - Connection org.apache.derby.impl.jdbc.EmbedConnection@1913174287 
(XID = null), (SESSIONID = 68), (DATABASE = 
/home/jenkins/agent/workspace/internal-hive-precommit_PR-2/itests/qtest/target/tmp/junit_metastore_db),
 (DRDAID = null)  marked as broken because of SQLSTATE(08003), ErrorCode(4)
java.sql.SQLNonTransientConnectionException: No current connection.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) 
~[derby-10.14.2.0.jar:?]
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) 
~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source) ~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
Source) ~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.Util.noCurrentConnection(Unknown Source) 
~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedConnection.checkIfClosed(Unknown 
Source) ~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedConnection.setupContextStack(Unknown 
Source) ~[derby-10.14.2.0.jar:?]
at org.apache.derby.impl.jdbc.EmbedConnection.rollback(Unknown Source) 
~[derby-10.14.2.0.jar:?]
at 
com.zaxxer.hikari.pool.ProxyConnection.rollback(ProxyConnection.java:362) 
~[HikariCP-2.6.1.jar:?]
at 
com.zaxxer.hikari.pool.HikariProxyConnection.rollback(HikariProxyConnection.java)
 ~[HikariCP-2.6.1.jar:?]
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.rollbackDBConn(TxnHandler.java:3787)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.heartbeat(TxnHandler.java:2912) 
~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:8440)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_262]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_262]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_262]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at com.sun.proxy.$Proxy58.heartbeat(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:3250)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_262]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_262]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_262]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262]
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
 ~[hive-standalone-metastore-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at com.sun.proxy.$Proxy59.heartbeat(Unknown Source) ~[?:?]
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:665) 
~[hive-exec-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.lambda$run$0(DbTxnManager.java:1085)
 ~[hive-exec-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at java.security.AccessController.doPrivileged(Native Method) 
[?:1.8.0_262]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_262]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
 [hadoop-common-3.1.1.7.2.3.0-212.jar:?]
at 
org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:1084)
 [hive-exec-3.1.3000.7.2.3.0-212.jar:3.1.3000.7.2.3.0-212]
at 

[jira] [Updated] (HIVE-24320) TestMiniLlapLocal sometimes hangs because of some derby issues

2020-10-28 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24320:

Attachment: 3hr.jstack
3hr.hive.log

> TestMiniLlapLocal sometimes hangs because of some derby issues
> --
>
> Key: HIVE-24320
> URL: https://issues.apache.org/jira/browse/HIVE-24320
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Attachments: 3hr.hive.log, 3hr.jstack
>
>
> code in question is a slightly modified version of branch-3
> opening ticket to make notes about the investigation
> {code}
> "dcce5fec-2365-4697-8a8f-04a4dfa5d9f5 main" #1 prio=5 os_prio=0 
> tid=0x7fd7c000a800 nid=0x1de23 waiting on condition [0x7fd7c4b7]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xc61635f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1981)
> at 
> org.apache.derby.impl.services.cache.CacheEntry.waitUntilIdentityIsSet(Unknown
>  Source)
> at 
> org.apache.derby.impl.services.cache.ConcurrentCache.getEntry(Unknown Source)
> at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
> at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
> at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown 
> Source)
> at org.apache.derby.impl.store.access.heap.Heap.open(Unknown Source)
> at 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
> Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndexMinion(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndex(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSubKeyConstraint(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorViaIndex(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorsScan(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptors(Unknown
>  Source)
> - locked <0xc615c9a8> (a 
> org.apache.derby.iapi.sql.dictionary.ConstraintDescriptorList)
> at 
> org.apache.derby.iapi.sql.dictionary.TableDescriptor.getAllRelevantConstraints(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.compile.DMLModStatementNode.getAllRelevantConstraints(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.compile.DMLModStatementNode.bindConstraints(Unknown 
> Source)
> at org.apache.derby.impl.sql.compile.DeleteNode.bindStatement(Unknown 
> Source)
> at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown 
> Source)
> at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
> at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
> at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> - locked <0xc4bb5fd0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection)
> at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeBatchElement(Unknown Source)
> at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeLargeBatch(Unknown Source)
> - locked <0xc4bb5fd0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection)
> at org.apache.derby.impl.jdbc.EmbedStatement.executeBatch(Unknown 
> Source)
> at 
> com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125)
> at 
> com.zaxxer.hikari.pool.HikariProxyStatement.executeBatch(HikariProxyStatement.java)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.executeQueriesInBatch(TxnDbUtil.java:658)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.updateCommitIdAndCleanUpMetadata(TxnHandler.java:1338)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.commitTxn(TxnHandler.java:1236)
> at 
> 

[jira] [Assigned] (HIVE-24320) TestMiniLlapLocal sometimes hangs because of some derby issues

2020-10-28 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24320:
---


> TestMiniLlapLocal sometimes hangs because of some derby issues
> --
>
> Key: HIVE-24320
> URL: https://issues.apache.org/jira/browse/HIVE-24320
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> code in question is a slightly modified version of branch-3
> opening ticket to make notes about the investigation
> {code}
> "dcce5fec-2365-4697-8a8f-04a4dfa5d9f5 main" #1 prio=5 os_prio=0 
> tid=0x7fd7c000a800 nid=0x1de23 waiting on condition [0x7fd7c4b7]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0xc61635f0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1981)
> at 
> org.apache.derby.impl.services.cache.CacheEntry.waitUntilIdentityIsSet(Unknown
>  Source)
> at 
> org.apache.derby.impl.services.cache.ConcurrentCache.getEntry(Unknown Source)
> at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
> at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.openContainer(Unknown
>  Source)
> at org.apache.derby.impl.store.raw.xact.Xact.openContainer(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.access.conglomerate.OpenConglomerate.init(Unknown 
> Source)
> at org.apache.derby.impl.store.access.heap.Heap.open(Unknown Source)
> at 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
> Source)
> at 
> org.apache.derby.impl.store.access.RAMTransaction.openConglomerate(Unknown 
> Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndexMinion(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getDescriptorViaIndex(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSubKeyConstraint(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorViaIndex(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptorsScan(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getConstraintDescriptors(Unknown
>  Source)
> - locked <0xc615c9a8> (a 
> org.apache.derby.iapi.sql.dictionary.ConstraintDescriptorList)
> at 
> org.apache.derby.iapi.sql.dictionary.TableDescriptor.getAllRelevantConstraints(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.compile.DMLModStatementNode.getAllRelevantConstraints(Unknown
>  Source)
> at 
> org.apache.derby.impl.sql.compile.DMLModStatementNode.bindConstraints(Unknown 
> Source)
> at org.apache.derby.impl.sql.compile.DeleteNode.bindStatement(Unknown 
> Source)
> at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown 
> Source)
> at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
> at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
> at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> - locked <0xc4bb5fd0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection)
> at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeBatchElement(Unknown Source)
> at 
> org.apache.derby.impl.jdbc.EmbedStatement.executeLargeBatch(Unknown Source)
> - locked <0xc4bb5fd0> (a 
> org.apache.derby.impl.jdbc.EmbedConnection)
> at org.apache.derby.impl.jdbc.EmbedStatement.executeBatch(Unknown 
> Source)
> at 
> com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:125)
> at 
> com.zaxxer.hikari.pool.HikariProxyStatement.executeBatch(HikariProxyStatement.java)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnDbUtil.executeQueriesInBatch(TxnDbUtil.java:658)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.updateCommitIdAndCleanUpMetadata(TxnHandler.java:1338)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.commitTxn(TxnHandler.java:1236)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.commit_txn(HiveMetaStore.java:8315)
> at 

[jira] [Assigned] (HIVE-24312) Use column stats to remove "x is not null" filter conditions if they are redundant

2020-10-26 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24312:
---


> Use column stats to remove "x is not null" filter conditions if they are 
> redundant
> --
>
> Key: HIVE-24312
> URL: https://issues.apache.org/jira/browse/HIVE-24312
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> with HIVE-24241 SharedWorkOptimizer could further merge branches for some 
> queries (ex: 
> [query32|https://github.com/apache/hive/blob/db895f374bf63b77b683574fdf678bfac91a5ac6/ql/src/test/results/clientpositive/perf/tez/query32.q.out#L118-L163]
>  )
> ...but a little `is not null` difference prevents it from proceeding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24295) Apply schema merge to all shared work optimizations

2020-10-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24295:
---


> Apply schema merge to all shared work optimizations
> ---
>
> Key: HIVE-24295
> URL: https://issues.apache.org/jira/browse/HIVE-24295
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-20 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24231.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23667) Incorrect output with option hive.auto.convert.join=fasle

2020-10-19 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216552#comment-17216552
 ] 

Zoltan Haindrich commented on HIVE-23667:
-

well; that doesn't sound right to me... hive 1 should have written data with 
the hash algorithm corresponding to bucketing_version=1 which is std hash

the selector logic had a few issues which I think were addressed in HIVE-21304

> Incorrect output with option hive.auto.convert.join=fasle
> -
>
> Key: HIVE-23667
> URL: https://issues.apache.org/jira/browse/HIVE-23667
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: gaozhan ding
>Priority: Critical
>
> We use hive with version 3.1.0 with tez engine 0.9.1.3
> I encountered an error when executing a hive SQL. This SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=1;
> set hive.exec.max.dynamic.partitions=1;
> set hive.fileformat.check=false;
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=true;
> use xxx;
> select count(*) from   230_dim_site  join dw_fact_inverter_detail on  
> dw_fact_inverter_detail.site=230_dim_site.id;{code}
> with the output.
> {code:java}
> +--+ | _c0 | +--+ | 4954736 | +--+
> {code}
> But when the hive.auto.convert.join option is set to false,the utput is not 
> as expected。
> The SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=1;
> set hive.exec.max.dynamic.partitions=1;
> set hive.fileformat.check=false;  
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=false; //changed
> use xxx;
> select count(*) from   230_dim_site  join dw_fact_inverter_detail on  
> dw_fact_inverter_detail.site=230_dim_site.id;{code}
> with output:
> {code:java}
> +--+ | _c0 | +--+ | 0 | +--+
> {code}
> Beside,both tables participating in the join are partition tables.
> Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql 
> output expected results.
> We just upgraded hive from 1.2 to 3.1.0, and we found that these problems 
> only occurred in the old hive table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22098) Data loss occurs when multiple tables are join with different bucket_version

2020-10-19 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216551#comment-17216551
 ] 

Zoltan Haindrich commented on HIVE-22098:
-

I somehow missed this ticket - note that HIVE-21304 have fixed a few issues 
with bucketing_version related stuff...so this might be fixed on master

> Data loss occurs when multiple tables are join with different bucket_version
> 
>
> Key: HIVE-22098
> URL: https://issues.apache.org/jira/browse/HIVE-22098
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Assignee: yongtaoliao
>Priority: Blocker
>  Labels: data-loss, wrongresults
> Attachments: HIVE-22098.1.patch, image-2019-08-12-18-45-15-771.png, 
> join_test.sql, table_a_data.orc, table_b_data.orc, table_c_data.orc
>
>
> When different bucketVersion of tables do join and no of reducers is greater 
> than 2, the result is incorrect (*data loss*).
>  *Scenario 1*: Three tables join. The temporary result data of table_a in the 
> first table and table_b in the second table joins result is recorded as 
> tmp_a_b, When it joins with the third table, the bucket_version=2 of the 
> table created by default after hive-3.0.0, temporary data tmp_a_b initialized 
> the bucketVerison=-1, and then ReduceSinkOperator Verketison=-1 is joined. In 
> the init method, the hash algorithm of selecting join column is selected 
> according to bucketVersion. If bucketVersion = 2 and is not an acid 
> operation, it will acquired the new algorithm of hash. Otherwise, the old 
> algorithm of hash is acquired. Because of the inconsistency of the algorithm 
> of hash, the partition of data allocation caused are different. At stage of 
> Reducer, Data with the same key can not be paired resulting in data loss.
> *Scenario 2*: create two test tables, create table 
> table_bucketversion_1(col_1 string, col_2 string) TBLPROPERTIES 
> ('bucketing_version'='1'); table_bucketversion_2(col_1 string, col_2 string) 
> TBLPROPERTIES ('bucketing_version'='2');
>  when use table_bucketversion_1 to join table_bucketversion_2, partial result 
> data will be loss due to bucketVerison is different.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24106) Abort polling on the operation state when the current thread is interrupted

2020-10-14 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24106.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~dengzh]!

> Abort polling on the operation state when the current thread is interrupted
> ---
>
> Key: HIVE-24106
> URL: https://issues.apache.org/jira/browse/HIVE-24106
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> If running HiveStatement asynchronously as a task like in a thread or future, 
>  if we interrupt the task,  the HiveStatement would continue to poll on the 
> operation state until finish. It's may better to provide a way to abort the 
> executing in such case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23800) Add hooks when HiveServer2 stops due to OutOfMemoryError

2020-10-14 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23800.
-
Fix Version/s: 4.0.0
 Assignee: Zhihua Deng
   Resolution: Fixed

merged into master. Thank you [~dengzh]!

> Add hooks when HiveServer2 stops due to OutOfMemoryError
> 
>
> Key: HIVE-23800
> URL: https://issues.apache.org/jira/browse/HIVE-23800
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Make oom hook an interface of HiveServer2,  so user can implement the hook to 
> do something before HS2 stops, such as dumping the heap or altering the 
> devops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24265) Fix acid_stats2 test

2020-10-13 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213219#comment-17213219
 ] 

Zoltan Haindrich commented on HIVE-24265:
-

* I've run 3 times the test without the the patch (all passed)
* and 3 times with the patch (all failed)

so it seems like this is definetly related to HIVE-24202

> Fix acid_stats2 test
> 
>
> Key: HIVE-24265
> URL: https://issues.apache.org/jira/browse/HIVE-24265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> This test's failure started to create incorrect junit xmls which was not 
> counted correctly by jenkins.
> I'll disable the test now - and provide details on when it failed first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24269) In SharedWorkOptimizer run simplification after merging TS filter expressions

2020-10-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24269:

Description: https://github.com/apache/hive/pull/1553#discussion_r503837757

> In SharedWorkOptimizer run simplification after merging TS filter expressions
> -
>
> Key: HIVE-24269
> URL: https://issues.apache.org/jira/browse/HIVE-24269
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> https://github.com/apache/hive/pull/1553#discussion_r503837757



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24269) In SharedWorkOptimizer run simplification after merging TS filter expressions

2020-10-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24269:
---


> In SharedWorkOptimizer run simplification after merging TS filter expressions
> -
>
> Key: HIVE-24269
> URL: https://issues.apache.org/jira/browse/HIVE-24269
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24268) Investigate srcpart scans in dynamic_partition_pruning test

2020-10-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24268:

Description: 
there seems to be some opportunities missed by shared work optimizer

see srcpart scans around 
[here|https://github.com/apache/hive/pull/1553/files/31ddb97bf412a2679489a5a0d45d335d2708005c#diff-cb6322e933f130f318d462f2c3af839dac60f8acca2074b2685f1847066e3565R4313]

https://github.com/apache/hive/pull/1553#discussion_r503834803


  was:
there seems to be some opportunities missed by shared work optimizer

see srcpart scans around 
[here|https://github.com/apache/hive/pull/1553/files/31ddb97bf412a2679489a5a0d45d335d2708005c#diff-cb6322e933f130f318d462f2c3af839dac60f8acca2074b2685f1847066e3565R4313]




> Investigate srcpart scans in dynamic_partition_pruning test
> ---
>
> Key: HIVE-24268
> URL: https://issues.apache.org/jira/browse/HIVE-24268
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there seems to be some opportunities missed by shared work optimizer
> see srcpart scans around 
> [here|https://github.com/apache/hive/pull/1553/files/31ddb97bf412a2679489a5a0d45d335d2708005c#diff-cb6322e933f130f318d462f2c3af839dac60f8acca2074b2685f1847066e3565R4313]
> https://github.com/apache/hive/pull/1553#discussion_r503834803



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24268) Investigate srcpart scans in dynamic_partition_pruning test

2020-10-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24268:
---


> Investigate srcpart scans in dynamic_partition_pruning test
> ---
>
> Key: HIVE-24268
> URL: https://issues.apache.org/jira/browse/HIVE-24268
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there seems to be some opportunities missed by shared work optimizer
> see srcpart scans around 
> [here|https://github.com/apache/hive/pull/1553/files/31ddb97bf412a2679489a5a0d45d335d2708005c#diff-cb6322e933f130f318d462f2c3af839dac60f8acca2074b2685f1847066e3565R4313]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24265) Fix acid_stats2 test

2020-10-13 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212986#comment-17212986
 ] 

Zoltan Haindrich commented on HIVE-24265:
-

this is odd... HIVE-24202 doesn't look like something which have made stat 
related changes
I'll validate the bisect's result

> Fix acid_stats2 test
> 
>
> Key: HIVE-24265
> URL: https://issues.apache.org/jira/browse/HIVE-24265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> This test's failure started to create incorrect junit xmls which was not 
> counted correctly by jenkins.
> I'll disable the test now - and provide details on when it failed first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24265) Fix acid_stats2 test

2020-10-13 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212983#comment-17212983
 ] 

Zoltan Haindrich commented on HIVE-24265:
-

bisect ended up with 8f4f3b90fa5987e82025ecf81f8084c90130fd6b / HIVE-24202
cc: [~jcamachorodriguez]

> Fix acid_stats2 test
> 
>
> Key: HIVE-24265
> URL: https://issues.apache.org/jira/browse/HIVE-24265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> This test's failure started to create incorrect junit xmls which was not 
> counted correctly by jenkins.
> I'll disable the test now - and provide details on when it failed first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24248:
---

Assignee: Zhihua Deng

> TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
> --
>
> Key: HIVE-24248
> URL: https://issues.apache.org/jira/browse/HIVE-24248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
> {code:java}
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing subquery_join_rewrite.q
> 241,244d240
> < 1 1
> < 1 2
> < 2 1
> < 2 2
> 245a242,243
> > 2 2
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24248.
-
Resolution: Fixed

merged into master. Thank you Zhihua Deng for fixing this and Krisztian for 
reviewing the changes!

> TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
> --
>
> Key: HIVE-24248
> URL: https://issues.apache.org/jira/browse/HIVE-24248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
> {code:java}
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing subquery_join_rewrite.q
> 241,244d240
> < 1 1
> < 1 2
> < 2 1
> < 2 2
> 245a242,243
> > 2 2
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24146) Cleanup TaskExecutionException in GenericUDTFExplode

2020-10-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24146.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Zhihua Deng!

> Cleanup TaskExecutionException in GenericUDTFExplode
> 
>
> Key: HIVE-24146
> URL: https://issues.apache.org/jira/browse/HIVE-24146
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> - Remove TaskExecutionException, which may be not used anymore;
> - Remove the default handling in GenericUDTFExplode#process, which has been 
> verified during the function initializing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24107) Fix typo in ReloadFunctionsOperation

2020-10-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24107.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~dengzh]!

> Fix typo in ReloadFunctionsOperation
> 
>
> Key: HIVE-24107
> URL: https://issues.apache.org/jira/browse/HIVE-24107
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hive.get() will register all functions as doRegisterAllFns is true,  so 
> Hive.get().reloadFunctions() may load all functions from metastore twice, use 
> Hive.get(false) instead may be better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24069) HiveHistory should log the task that ends abnormally

2020-10-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24069.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~dengzh]!

> HiveHistory should log the task that ends abnormally
> 
>
> Key: HIVE-24069
> URL: https://issues.apache.org/jira/browse/HIVE-24069
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When the task returns with the exitVal not equal to 0,  The Executor would 
> skip marking the task return code and calling endTask.  This may make the 
> history log incomplete for such tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24264) Fix failed-to-read errors in precommit runs

2020-10-12 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24264:
---


> Fix failed-to-read errors in precommit runs
> ---
>
> Key: HIVE-24264
> URL: https://issues.apache.org/jira/browse/HIVE-24264
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> the following happens:
> * this seems to be caused by tests outputting a lot of messages
> * some error happens in surefire - and the system-err is discarded
> * junit xml becomes corrupted
> * jenkins does report the failure - but doesn't take it into account in build 
> result setting; so the result will remain green



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-10-09 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23851.
-
Resolution: Fixed

merged into master. Thank you Syed Shameerur Rahman!

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class 

[jira] [Comment Edited] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-09 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210686#comment-17210686
 ] 

Zoltan Haindrich edited comment on HIVE-24248 at 10/9/20, 7:45 AM:
---

Thank you [~dengzh] for opening this ticket - I was also about to do the same :)

I've disabled this test - please also run a flaky check before enabling it back
http://ci.hive.apache.org/job/hive-flaky-check/124/


was (Author: kgyrtkirk):
I've disabled this test - please also run a flaky check before enabling it back
http://ci.hive.apache.org/job/hive-flaky-check/124/

> TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
> --
>
> Key: HIVE-24248
> URL: https://issues.apache.org/jira/browse/HIVE-24248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
> {code:java}
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing subquery_join_rewrite.q
> 241,244d240
> < 1 1
> < 1 2
> < 2 1
> < 2 2
> 245a242,243
> > 2 2
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24248) TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky

2020-10-09 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17210686#comment-17210686
 ] 

Zoltan Haindrich commented on HIVE-24248:
-

I've disabled this test - please also run a flaky check before enabling it back
http://ci.hive.apache.org/job/hive-flaky-check/124/

> TestMiniLlapLocalCliDriver[subquery_join_rewrite] is flaky
> --
>
> Key: HIVE-24248
> URL: https://issues.apache.org/jira/browse/HIVE-24248
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1205/26/tests]
> {code:java}
> java.lang.AssertionError:
> Client Execution succeeded but contained differences (error code = 1) after 
> executing subquery_join_rewrite.q
> 241,244d240
> < 1 1
> < 1 2
> < 2 1
> < 2 2
> 245a242,243
> > 2 2
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2020-10-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24242:
---


> Relax safety checks in SharedWorkOptimizer
> --
>
> Key: HIVE-24242
> URL: https://issues.apache.org/jira/browse/HIVE-24242
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there are some checks to lock out problematic cases
> For UnionOperator 
> [here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571])
> This check could prevent the optimization even if the Union is only visible 
> from only 1 of the TS ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24242) Relax safety checks in SharedWorkOptimizer

2020-10-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24242:

Description: 
there are some checks to lock out problematic cases

For UnionOperator 
[here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571]

This check could prevent the optimization even if the Union is only visible 
from only 1 of the TS ops.



  was:
there are some checks to lock out problematic cases

For UnionOperator 
[here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571])

This check could prevent the optimization even if the Union is only visible 
from only 1 of the TS ops.




> Relax safety checks in SharedWorkOptimizer
> --
>
> Key: HIVE-24242
> URL: https://issues.apache.org/jira/browse/HIVE-24242
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> there are some checks to lock out problematic cases
> For UnionOperator 
> [here|https://github.com/apache/hive/blob/1507d80fd47aad38b87bba4fd58c1427ba89dbbf/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java#L1571]
> This check could prevent the optimization even if the Union is only visible 
> from only 1 of the TS ops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24241) Enable SharedWorkOptimizer to merge downstream operators after an optimization step

2020-10-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24241:
---


> Enable SharedWorkOptimizer to merge downstream operators after an 
> optimization step
> ---
>
> Key: HIVE-24241
> URL: https://issues.apache.org/jira/browse/HIVE-24241
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24231) Enhance shared work optimizer to merge scans with filters on both sides

2020-10-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24231:

Summary: Enhance shared work optimizer to merge scans with filters on both 
sides  (was: Enhance shared work optimizer to merge scans with semijoin filters 
on both sides)

> Enhance shared work optimizer to merge scans with filters on both sides
> ---
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24229) DirectSql fails in case of OracleDB

2020-10-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24229.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged into master. Thank you [~ayushtkn]!

> DirectSql fails in case of OracleDB
> ---
>
> Key: HIVE-24229
> URL: https://issues.apache.org/jira/browse/HIVE-24229
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Direct Sql fails due to different data type mapping incase of Oracle DB



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24040) Slightly odd behaviour with CHAR comparisons and string literals

2020-10-07 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209610#comment-17209610
 ] 

Zoltan Haindrich commented on HIVE-24040:
-

{code}
select cast('a' as char(10)) = cast('a ' as varchar(50))
{code}

in psql I got some interesting results:
{code}
select length(cast('a ' as varchar(10))),length(cast('a ' as char(10) ) 
),cast('a ' as varchar(10))=cast('a ' as char(10) );
 length | length | ?column? 
++--
  2 |  1 | t
{code}

in Hive for the above case the comparision should happen in "string" for which 
the lengths are different => will not match
{code}
select length(cast(cast('a' as char(10)) as string)),length(cast(cast('a ' as 
varchar(50)) as string))
+--+--+
| _c0  | _c1  |
+--+--+
| 1| 2|
+--+--+
{code}

I feel that this is somewhere in the gray zone...will dig into the sql specs...

> Slightly odd behaviour with CHAR comparisons and string literals
> 
>
> Key: HIVE-24040
> URL: https://issues.apache.org/jira/browse/HIVE-24040
> Project: Hive
>  Issue Type: Bug
>Reporter: Tim Armstrong
>Priority: Major
>
> If t is a char column, this statement behaves a bit strangely - since the RHS 
> is a STRING, I would have expected it to behave consistently with other 
> CHAR/STRING comparisons, where the CHAR column has its trailing spaces 
> removed and the STRING does not have its trailing spaces removed.
> {noformat}
> select count(*) from ax where t = cast('a ' as string);
> {noformat}
> Instead it seems to be treated the same as if it was a plain literal, 
> interpreted as CHAR, i.e.
> {noformat}
> select count(*) from ax where t = 'a ';
> {noformat}
> Here are some more experiments I did based on 
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/in_typecheck_char.q
>  that seem to show some inconsistencies.
> {noformat}
> -- Hive version 3.1.3000.7.2.1.0-287 r4e72e59f1c2a51a64e0ff37b14bd396cd4e97b98
> create table ax(s char(1),t char(10));
> insert into ax values ('a','a'),('a','a '),('b','bb');
> -- varchar literal preserves trailing space
> select count(*) from ax where t = cast('a ' as varchar(50));
> +--+
> | _c0  |
> +--+
> | 0|
> +--+
> -- explicit cast of literal to string removes trailing space
> select count(*) from ax where t = cast('a ' as string);
> +--+
> | _c0  |
> +--+
> | 2|
> +--+
> -- other string expressions preserve trailing space
> select count(*) from ax where t = concat('a', ' ');
> +--+
> | _c0  |
> +--+
> | 0|
> +--+
> -- varchar col preserves trailing space
> create table stringv as select cast('a  ' as varchar(50));
> select count(*) from ax, stringv where t = `_c0`;
> +--+
> | _c0  |
> +--+
> | 0|
> +--+
> -- string col preserves trailing space
> create table stringa as select 'a  ';
> select count(*) from ax, stringa where t = `_c0`;
> +--+
> | _c0  |
> +--+
> | 0|
> +--+
> {noformat}
> [~jcamachorodriguez] [~kgyrtkirk]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23667) Incorrect output with option hive.auto.convert.join=fasle

2020-10-07 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209579#comment-17209579
 ] 

Zoltan Haindrich commented on HIVE-23667:
-

could you please give a complete example to reproduce the issue?

> Incorrect output with option hive.auto.convert.join=fasle
> -
>
> Key: HIVE-23667
> URL: https://issues.apache.org/jira/browse/HIVE-23667
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: gaozhan ding
>Priority: Critical
>
> We use hive with version 3.1.0 with tez engine 0.9.1.3
> I encountered an error when executing a hive SQL. This SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=1;
> set hive.exec.max.dynamic.partitions=1;
> set hive.fileformat.check=false;
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=true;
> use xxx;
> select count(*) from   230_dim_site  join dw_fact_inverter_detail on  
> dw_fact_inverter_detail.site=230_dim_site.id;{code}
> with the output.
> {code:java}
> +--+ | _c0 | +--+ | 4954736 | +--+
> {code}
> But when the hive.auto.convert.join option is set to false,the utput is not 
> as expected。
> The SQL is as follows
> {code:java}
> set mapreduce.job.queuename=root.xxx;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.exec.dynamic.partition=true;
> set hive.exec.max.dynamic.partitions.pernode=1;
> set hive.exec.max.dynamic.partitions=1;
> set hive.fileformat.check=false;  
> set mapred.reduce.tasks=50;
> set hive.auto.convert.join=false; //changed
> use xxx;
> select count(*) from   230_dim_site  join dw_fact_inverter_detail on  
> dw_fact_inverter_detail.site=230_dim_site.id;{code}
> with output:
> {code:java}
> +--+ | _c0 | +--+ | 0 | +--+
> {code}
> Beside,both tables participating in the join are partition tables.
> Especially,if the option mapred.reduce.tasks=50 was not set,all above the sql 
> output expected results.
> We just upgraded hive from 1.2 to 3.1.0, and we found that these problems 
> only occurred in the old hive table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24231) Enhance shared work optimizer to merge scans with semijoin filters on both sides

2020-10-05 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24231:
---


> Enhance shared work optimizer to merge scans with semijoin filters on both 
> sides
> 
>
> Key: HIVE-24231
> URL: https://issues.apache.org/jira/browse/HIVE-24231
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24193) Select query on renamed hive acid table does not produce any output

2020-10-05 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24193.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~Rajkumar Singh] for fixing this ; and Peter for 
reviewing the changes!

> Select query on renamed hive acid table does not produce any output
> ---
>
> Key: HIVE-24193
> URL: https://issues.apache.org/jira/browse/HIVE-24193
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During onRename, HMS update COMPLETED_TXN_COMPONENTS which fail with 
> CTC_DATABASE column does not exist, upon investigation I found that enclosing 
> quotes are missing for columns thats db query fail with this exception
> Steps to repro:
> 1. create table test(id int);
> 2. insert into table test values(1);
> 3. alter table test rename to test1;
> 3. select * from test1 produce no output



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24157) Strict mode to fail on CAST timestamp <-> numeric

2020-10-02 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24157.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus for reviewing the changes!

> Strict mode to fail on CAST timestamp <-> numeric
> -
>
> Key: HIVE-24157
> URL: https://issues.apache.org/jira/browse/HIVE-24157
> Project: Hive
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jesus Camacho Rodriguez
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There is some interest in enforcing that CAST numeric <\-> timestamp is 
> disallowed to avoid confusion among users, e.g., SQL standard does not allow 
> numeric <\-> timestamp casting, timestamp type is timezone agnostic, etc.
> We should introduce a strict config for timestamp (similar to others before): 
> If the config is true, we shall fail while compiling the query with a 
> meaningful message.
> To provide similar behavior, Hive has multiple functions that provide clearer 
> semantics for numeric to timestamp conversion (and vice versa):
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24161) Support Oracle CLOB type in beeline

2020-09-28 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24161.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~robbiezhang]!

> Support Oracle CLOB type in beeline
> ---
>
> Key: HIVE-24161
> URL: https://issues.apache.org/jira/browse/HIVE-24161
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We can use beeline as a JDBC client to access RDBMS such as Oracle. Sometimes 
> Oracle JDBC driver will return a CLOB object instead of a String object if 
> the string is too long. Beeline used to work well with CLOB type but it's 
> broken by HIVE-14786:
> [https://github.com/apache/hive/blob/2a760dd607e206d7f1061c01075767ecfff40d0c/beeline/src/java/org/apache/hive/beeline/Rows.java#L169]
> In the above line, when Oracle JDBC driver returns a CLOB object, it returns 
> a string like "oracle.sql.CLOB@2f7c7260". In this case, we should use 
> ResultSet.getString() rather than ResultSet.getObject().toString().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24189) Enable logging executed raw sql statements for HiveServer2 before parsing

2020-09-23 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200697#comment-17200697
 ] 

Zoltan Haindrich commented on HIVE-24189:
-

I think we already log the statement a few places - so I don't really see the 
need to create a conf key to toggle this one

I know we might have some confusion in our contribution guide - earlier 
approach was to attach patches to the jira...
a few months ago we've moved testing/etc to github - so could you open a PR 
with your changes?


> Enable logging executed raw sql statements  for HiveServer2 before parsing
> --
>
> Key: HIVE-24189
> URL: https://issues.apache.org/jira/browse/HIVE-24189
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 3.1.0
>Reporter: ji.chen
>Assignee: ji.chen
>Priority: Trivial
> Attachments: 
> 0001-Enable-logging-executed-sql-statements-for-HiveServe.patch
>
>
> Current Hiveserver2 lacks the feature of logging command related statement in 
> hiveserver2 log files. It will cause inconvenience for customer to debug 
> command related statement. 
> We have follow case:
> HiveServer2 is crashed when our customer executed SET statement, however we 
> can't find exact SET  statement customer was executed before.If raw statement 
> is recorded in hiveserver2 logs, it will facilitate us to debugging command 
> related statement issue.
> So we propose a improvement to write the raw statement in hiveserver2 log
> property  hive.session.raw.statement.verbose is introduced to control output 
> of raw sql statement, by default, it is enabled.
> it can be disabled by setting hive.session.raw.statement.verbose to false.
>  
>  
> We have attached patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns

2020-09-16 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24084.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Jesus and Vineet for reviewing the changes!

> Push Aggregates thru joins in case it re-groups previously unique columns
> -
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24160) Scheduled executions must allow state transition EXECUTING->TIMED_OUT

2020-09-16 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24160.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you Krisztian for reviewing the changes!

> Scheduled executions must allow state transition EXECUTING->TIMED_OUT
> -
>
> Key: HIVE-24160
> URL: https://issues.apache.org/jira/browse/HIVE-24160
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23847) Extracting hive-parser module broke exec jar upload in tez

2020-09-15 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-23847.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

seems like patch I've merged the patch around a month ago - but I left this 
ticket open by mistake...
thank you [~asinkovits] for fixing this!

> Extracting hive-parser module broke exec jar upload in tez
> --
>
> Key: HIVE-23847
> URL: https://issues.apache.org/jira/browse/HIVE-23847
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> 2020-07-13 16:53:50,551 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1594632473849_0001_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1594632473849_0001_1_00_00_0, 
> creationTime=1594652027059, allocationTime=1594652028460, 
> startTime=1594652029356, finishTime=1594652030546, timeTaken=1190, 
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR, 
> diagnostics=Error: Error while running task ( failure ) : 
> attempt_1594632473849_0001_1_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:340)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   ... 16 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/parse/ParseException
>   at java.lang.Class.getDeclaredConstructors0(Native Method)
>   at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
>   at java.lang.Class.getConstructor0(Class.java:3075)
>   at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>   at 
> org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79)
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:225)
>   at 
> org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:217)
>   at 
> org.apache.hadoop.hive.ql.exec.FunctionRegistry.(FunctionRegistry.java:544)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.isDeterministic(ExprNodeGenericFuncEvaluator.java:154)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.isConsistentWithinQuery(ExprNodeEvaluator.java:117)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.iterate(ExprNodeEvaluatorFactory.java:102)
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.toCachedEvals(ExprNodeEvaluatorFactory.java:76)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:69)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:359)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:548)
>  

[jira] [Updated] (HIVE-24160) Scheduled executions must allow state transition EXECUTING->TIMED_OUT

2020-09-15 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24160:

Summary: Scheduled executions must allow state transition 
EXECUTING->TIMED_OUT  (was: Scheduled executions must allow state transitions 
to TIMED_OUT from any state)

> Scheduled executions must allow state transition EXECUTING->TIMED_OUT
> -
>
> Key: HIVE-24160
> URL: https://issues.apache.org/jira/browse/HIVE-24160
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24039) Update jquery version to mitigate CVE-2020-11023

2020-09-15 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24039.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~Rajkumar Singh]!

> Update jquery version to mitigate CVE-2020-11023
> 
>
> Key: HIVE-24039
> URL: https://issues.apache.org/jira/browse/HIVE-24039
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> there is known vulnerability in jquery version used by hive, with this jira 
> plan is to upgrade the jquery version 3.5.0 where it's been fixed. more 
> details about the vulnerability can be found here.
> https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-11023



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24157) Strict mode to fail on CAST timestamp <-> numeric

2020-09-14 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24157:
---

Assignee: Zoltan Haindrich

> Strict mode to fail on CAST timestamp <-> numeric
> -
>
> Key: HIVE-24157
> URL: https://issues.apache.org/jira/browse/HIVE-24157
> Project: Hive
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Jesus Camacho Rodriguez
>Assignee: Zoltan Haindrich
>Priority: Major
>
> There is some interest in enforcing that CAST numeric <\-> timestamp is 
> disallowed to avoid confusion among users, e.g., SQL standard does not allow 
> numeric <\-> timestamp casting, timestamp type is timezone agnostic, etc.
> We should introduce a strict config for timestamp (similar to others before): 
> If the config is true, we shall fail while compiling the query with a 
> meaningful message.
> To provide similar behavior, Hive has multiple functions that provide clearer 
> semantics for numeric to timestamp conversion (and vice versa):
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24160) Scheduled executions must allow state transitions to TIMED_OUT from any state

2020-09-14 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24160:
---


> Scheduled executions must allow state transitions to TIMED_OUT from any state
> -
>
> Key: HIVE-24160
> URL: https://issues.apache.org/jira/browse/HIVE-24160
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation

2020-09-09 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24072.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

pushed to master. Thank you Jesus for reviewing the changes!

> HiveAggregateJoinTransposeRule may try to create an invalid transformation
> --
>
> Key: HIVE-24072
> URL: https://issues.apache.org/jira/browse/HIVE-24072
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: 
> Cannot add expression of different type to set:
> set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) 
> o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) 
> CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL
> expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL 
> o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL 
> c_custkey, DOUBLE $f1) NOT NULL
> set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 
> 5, 6, 7},agg#0=sum($1))
> expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], 
> o_orderdate=[$5], c_custkey=[$6], $f1=[$1])
>   HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], 
> cost=[{2284.5 rows, 0.0 cpu, 0.0 io}])
> HiveAggregate(group=[{0}], agg#0=[sum($1)])
>   HiveProject(l_orderkey=[$0], l_quantity=[$4])
> HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l])
> HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], 
> cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}])
>   HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], 
> cost=[{1650.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], 
> o_orderdate=[$4])
>   HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders])
> HiveProject(c_custkey=[$0], c_name=[$1])
>   HiveTableScan(table=[[tpch_0_001, customer]], 
> table:alias=[customer])
>   HiveProject($f0=[$0])
> HiveFilter(condition=[>($1, 3E2)])
>   HiveAggregate(group=[{0}], agg#0=[sum($4)])
> HiveTableScan(table=[[tpch_0_001, lineitem]], 
> table:alias=[lineitem])
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24130) Support datasets for non-default database

2020-09-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24130:
---


> Support datasets for non-default database
> -
>
> Key: HIVE-24130
> URL: https://issues.apache.org/jira/browse/HIVE-24130
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> tpch datasets were added in a different database - but QTestDatasetHandler 
> only considers tables by "name" and ignores the "db" - so the protection 
> mechanism doesn't fully work for the tpch tables are being wiped out after 
> the first test using them and never loaded back



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24084) Push Aggregates thru joins in case it re-groups previously unique columns

2020-09-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24084:

Summary: Push Aggregates thru joins in case it re-groups previously unique 
columns  (was: Enhance cost model to push down more Aggregates)

> Push Aggregates thru joins in case it re-groups previously unique columns
> -
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24123) Improve cost model for Aggregates

2020-09-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24123:
---


> Improve cost model for Aggregates
> -
>
> Key: HIVE-24123
> URL: https://issues.apache.org/jira/browse/HIVE-24123
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24104) NPE due to null key columns in ReduceSink after deduplication

2020-09-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24104.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~zabetak]!

> NPE due to null key columns in ReduceSink after deduplication
> -
>
> Key: HIVE-24104
> URL: https://issues.apache.org/jira/browse/HIVE-24104
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In some cases the {{ReduceSinkDeDuplication}} optimization creates ReduceSink 
> operators where the key columns are null. This can lead to NPE in various 
> places in the code. 
> The following stracktraces show some places where a NPE appears. Note that 
> the stacktraces do not correspond to the same query.
> +NPE  during planning+
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.plan.ExprNodeDesc$ExprNodeDescEqualityWrapper.equals(ExprNodeDesc.java:141)
>   at java.util.AbstractList.equals(AbstractList.java:523)
>   at 
> org.apache.hadoop.hive.ql.optimizer.SetReducerParallelism.process(SetReducerParallelism.java:101)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:492)
>   at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:226)
>   at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:161)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12643)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:740)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:710)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> 

[jira] [Assigned] (HIVE-24084) Enhance cost model to push down more Aggregates

2020-08-27 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24084:
---


> Enhance cost model to push down more Aggregates
> ---
>
> Key: HIVE-24084
> URL: https://issues.apache.org/jira/browse/HIVE-24084
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-08-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184162#comment-17184162
 ] 

Zoltan Haindrich commented on HIVE-23725:
-

* I think using hooks is better - because it could give you more context about 
the actual system's state...if some information is not yet accessible using the 
hooks please try to extend the hooks with it...so that you can be notified/etc
 * making the plugin optional could have been usefull in that case...when a 
plugin doesn't work as expected; you could disable it - but in case it's burned 
in..you can't disable it...
 * I don't know why would you need to go over say 
HIVE_QUERY_MAX_REEXECUTION_COUNT - I think if we try something for 3 times it 
will likely not succeed after further attempts as well...

 

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23649) Fix FindBug issues in hive-service-rpc

2020-08-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-23649:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

merged to master. Thank you [~mustafaiman]!

> Fix FindBug issues in hive-service-rpc
> --
>
> Key: HIVE-23649
> URL: https://issues.apache.org/jira/browse/HIVE-23649
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: spotbugsXml.xml
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation

2020-08-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-24072:
---


> HiveAggregateJoinTransposeRule may try to create an invalid transformation
> --
>
> Key: HIVE-24072
> URL: https://issues.apache.org/jira/browse/HIVE-24072
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> {code}
> java.lang.AssertionError: 
> Cannot add expression of different type to set:
> set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) 
> o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) 
> CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL
> expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL 
> o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL 
> c_custkey, DOUBLE $f1) NOT NULL
> set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 
> 5, 6, 7},agg#0=sum($1))
> expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], 
> o_orderdate=[$5], c_custkey=[$6], $f1=[$1])
>   HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], 
> cost=[{2284.5 rows, 0.0 cpu, 0.0 io}])
> HiveAggregate(group=[{0}], agg#0=[sum($1)])
>   HiveProject(l_orderkey=[$0], l_quantity=[$4])
> HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l])
> HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], 
> cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}])
>   HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], 
> cost=[{1650.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], 
> o_orderdate=[$4])
>   HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders])
> HiveProject(c_custkey=[$0], c_name=[$1])
>   HiveTableScan(table=[[tpch_0_001, customer]], 
> table:alias=[customer])
>   HiveProject($f0=[$0])
> HiveFilter(condition=[>($1, 3E2)])
>   HiveAggregate(group=[{0}], agg#0=[sum($4)])
> HiveTableScan(table=[[tpch_0_001, lineitem]], 
> table:alias=[lineitem])
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-08-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183835#comment-17183835
 ] 

Zoltan Haindrich commented on HIVE-23725:
-

* this patch added an arbitrary MAX_EXECUTION 10 ?
* it's enabled by default it shouldn't be...it should be pluggable - so 
that you  don't mess up other plugins like this patch have done
* uses CommandProcessorException instead of tapping into the hooks? I see no 
benefit in that..
* and changed ALL existing plugin to check for max executions ? 

why didn't you guys pinged me?


> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-08-12 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176300#comment-17176300
 ] 

Zoltan Haindrich commented on HIVE-23965:
-

* the description clearly describes that the metastore data is a composition of 
questionable quality stuff...so we are running our planning tests against some 
weird metastore content
* I don't think adding more tests will increase test coverage - in this case we 
are talking about queries which are already run 2 times already - I've seen 
people updating q.out's like crazyso adding an extra 100 q.out-s will not 
neccessarily increase coverage...
* the independence from having docker setup is a great - the new approach uses 
docker - but if that's a problem we could try to come up with some other 
approach - I'm wondering about using an archived derby database with metastore 
data
* the metastore content lodader approach is quite unfortunate - IIRC once I had 
to fix up something in the loader once... because I made some changes to the 
column statistics

I think we should remove the old approach...and run tests against the new; 
more-realistic schema.




> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2020-08-12 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176126#comment-17176126
 ] 

Zoltan Haindrich commented on HIVE-22126:
-

[~zhengchenyu]: yes that's an option - however I've not seen this in hive-3 
before - but for hive-4 the same was done (HIVE-23593)

you may already know it - the issue behind this is that a shaded calcite may 
pull in thru reflection classes from the "non-shaded" version and that will 
wreck some havoc

there was an attempt to make a better fix than that; but it's not yet read 
(HIVE-23772)

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >