[jira] [Updated] (HIVE-16924) Support distinct in presence Gby

2017-07-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16924:

Attachment: HIVE-16924.01.patch

Patch 01 is to run the existing ptests to find any serious issue

> Support distinct in presence Gby 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Remus Rusanu
> Attachments: HIVE-16924.01.patch
>
>
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> These queries should work:
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16924) Support distinct in presence Gby

2017-07-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16924:

Status: Patch Available  (was: Open)

> Support distinct in presence Gby 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Remus Rusanu
>
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> These queries should work:
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17109) Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade

2017-07-18 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091438#comment-16091438
 ] 

Remus Rusanu commented on HIVE-17109:
-

I fixed errata.txt, hope is all good.  
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=9807a560c20c27b131f703cf74e8bc7a2199edb2

> Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade
> --
>
> Key: HIVE-17109
> URL: https://issues.apache.org/jira/browse/HIVE-17109
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Fix For: 3.0.0
>
> Attachments: HIVE-17019.01.patch
>
>
> After CALCITE-1812 the code should retrieve the RelMetadataQuery from the 
> planner, if needed. Calling {{RelMetadatQuery.instance()}} invalidates the 
> Calcite RelNode properties memoization cache. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17109) Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade

2017-07-18 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-17109:

Affects Version/s: 3.0.0
Fix Version/s: 3.0.0

> Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade
> --
>
> Key: HIVE-17109
> URL: https://issues.apache.org/jira/browse/HIVE-17109
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Fix For: 3.0.0
>
> Attachments: HIVE-17019.01.patch
>
>
> After CALCITE-1812 the code should retrieve the RelMetadataQuery from the 
> planner, if needed. Calling {{RelMetadatQuery.instance()}} invalidates the 
> Calcite RelNode properties memoization cache. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17109) Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade

2017-07-17 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-17109:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolved via 
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=e7081035bb9768bc014f0aba11417418ececbaf0

> Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade
> --
>
> Key: HIVE-17109
> URL: https://issues.apache.org/jira/browse/HIVE-17109
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-17019.01.patch
>
>
> After CALCITE-1812 the code should retrieve the RelMetadataQuery from the 
> planner, if needed. Calling {{RelMetadatQuery.instance()}} invalidates the 
> Calcite RelNode properties memoization cache. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17109) Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade

2017-07-17 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090249#comment-16090249
 ] 

Remus Rusanu commented on HIVE-17109:
-

Failed test diffs are transient and do not repro locally.

> Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade
> --
>
> Key: HIVE-17109
> URL: https://issues.apache.org/jira/browse/HIVE-17109
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-17019.01.patch
>
>
> After CALCITE-1812 the code should retrieve the RelMetadataQuery from the 
> planner, if needed. Calling {{RelMetadatQuery.instance()}} invalidates the 
> Calcite RelNode properties memoization cache. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17109) Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade

2017-07-17 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-17109:

Attachment: HIVE-17019.01.patch

> Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade
> --
>
> Key: HIVE-17109
> URL: https://issues.apache.org/jira/browse/HIVE-17109
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-17019.01.patch
>
>
> After CALCITE-1812 the code should retrieve the RelMetadataQuery from the 
> planner, if needed. Calling {{RelMetadatQuery.instance()}} invalidates the 
> Calcite RelNode properties memoization cache. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17109) Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade

2017-07-17 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-17109:

Status: Patch Available  (was: Open)

Patch 01 replaces all RelMetadataQuery.instance() calls in QL

> Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade
> --
>
> Key: HIVE-17109
> URL: https://issues.apache.org/jira/browse/HIVE-17109
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>
> After CALCITE-1812 the code should retrieve the RelMetadataQuery from the 
> planner, if needed. Calling {{RelMetadatQuery.instance()}} invalidates the 
> Calcite RelNode properties memoization cache. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17109) Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade

2017-07-17 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-17109:
---


> Remove calls to RelMetadataQuery.instance() after Calcite 1.13 upgrade
> --
>
> Key: HIVE-17109
> URL: https://issues.apache.org/jira/browse/HIVE-17109
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>
> After CALCITE-1812 the code should retrieve the RelMetadataQuery from the 
> planner, if needed. Calling {{RelMetadatQuery.instance()}} invalidates the 
> Calcite RelNode properties memoization cache. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17051) Each table metadata is requested twice during query compile

2017-07-06 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076398#comment-16076398
 ] 

Remus Rusanu commented on HIVE-17051:
-

This is a simple query
{noformat}
SELECT DISTINCT * FROM src;
{noformat}
If multiple tables are present (eg. JOIN) each table metadata is requested 
twice.


> Each table metadata is requested twice during query compile
> ---
>
> Key: HIVE-17051
> URL: https://issues.apache.org/jira/browse/HIVE-17051
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: performance
>
> As far as I can tell, for each table referenced in a query the metadata is 
> retrieved twice during compilation:
> first call:
> {noformat}
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1320)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1275)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getTableObjectByName(SemanticAnalyzer.java:10943)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1992)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1942)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:11178)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11309)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:295)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:261)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:566)
> {noformat}
> second call:
> {noformat}
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1320)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1275)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getTableObjectByName(SemanticAnalyzer.java:10943)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1992)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1942)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1934)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:431)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11320)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:295)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:261)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:566)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17051) Each table metadata is requested twice during query compile

2017-07-06 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-17051:
---


> Each table metadata is requested twice during query compile
> ---
>
> Key: HIVE-17051
> URL: https://issues.apache.org/jira/browse/HIVE-17051
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: performance
>
> As far as I can tell, for each table referenced in a query the metadata is 
> retrieved twice during compilation:
> first call:
> {noformat}
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1320)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1275)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getTableObjectByName(SemanticAnalyzer.java:10943)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1992)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1942)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:11178)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11309)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:295)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:261)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:566)
> {noformat}
> second call:
> {noformat}
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1320)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1275)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getTableObjectByName(SemanticAnalyzer.java:10943)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1992)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1942)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1934)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:431)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11320)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:295)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:261)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:566)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-27 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16888:

Attachment: HIVE-16888.06.patch

Patch 06 rebased to current master, druid GF updated

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch, 
> HIVE-16888.06.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061039#comment-16061039
 ] 

Remus Rusanu commented on HIVE-16888:
-

So should I include HIVE-16751? Or it will be committed separately?

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060948#comment-16060948
 ] 

Remus Rusanu commented on HIVE-16888:
-

[~jcamachorodriguez] yes, with HIVE-16751 it passes. It also updates, not sure 
if from HIVE-16751 or from HIVE-16888:

{noformat}
-druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":["robot"],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
+druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":[{"type":"default","dimension":"robot"}],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
{noformat}

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060948#comment-16060948
 ] 

Remus Rusanu edited comment on HIVE-16888 at 6/23/17 2:13 PM:
--

[~jcamachorodriguez] yes, with HIVE-16751 it passes. It also updates, similar 
to other HIVE-16888 druid changes:

{noformat}
-druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":["robot"],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
+druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":[{"type":"default","dimension":"robot"}],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
{noformat}


was (Author: rusanu):
[~jcamachorodriguez] yes, with HIVE-16751 it passes. It also updates, not sure 
if from HIVE-16751 or from HIVE-16888:

{noformat}
-druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":["robot"],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
+druid.query.json 
{"queryType":"groupBy","dataSource":"wikipedia","granularity":"all","dimensions":[{"type":"default","dimension":"robot"}],"limitSpec":{"type":"default"},"filter":{"type":"selector","dimension":"language","value":"en"},"aggregations":[{"type":"longSum","name":"dummy_agg","fieldName":"dummy_agg"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"]}
{noformat}

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16924) Support distinct in presence Gby

2017-06-23 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-16924:
---

Assignee: Remus Rusanu

> Support distinct in presence Gby 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Remus Rusanu
>
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> These queries should work:
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060804#comment-16060804
 ] 

Remus Rusanu edited comment on HIVE-16888 at 6/23/17 12:22 PM:
---

{{druid_basic2}} falls with exception:

{noformat}
2017-06-23T05:18:31,795 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
parse.CalcitePlanner: Created Table Plan for druid_table_1 TS[0]
2017-06-23T05:18:31,795 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
exec.FunctionRegistry: Method didn't match: passed = [string] accepted = 
[timestamp] method = public org.apache.hadoop.hive.serde2.io.TimestampWritable 
org.apache.hadoop.hive.ql.udf.UDFDateFloor.evaluate(org.apache.hadoop.hive.serde2.io.TimestampWritable)
2017-06-23T05:18:31,796 ERROR [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
parse.CalcitePlanner: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Wrong arguments 
'extract': No matching method for class 
org.apache.hadoop.hive.ql.udf.UDFDateFloorDay with (string). Possible choices: 
_FUNC_(timestamp)  
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1363)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:229)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:176)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11746)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11701)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11669)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3325)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3305)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9695)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10652)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10530)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:433)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11269)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:294)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:261)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:169)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}
on this plan:
{noformat}
2017-06-23T05:18:31,790 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
translator.PlanModifierForASTConv: Final plan after modifier
 HiveSortLimit(sort0=[$0], dir0=[ASC-nulls-first], fetch=[10])
  HiveProject(robot=[$1], __time=[$0])
HiveFilter(condition=[BETWEEN(false, FLOOR_DAY($0, FLAG(DAY)), 
CAST(1999-11-01 08:00:00):TIMESTAMP(9), CAST(1999-11-10 
08:00:00):TIMESTAMP(9))])
  DruidQuery(table=[[default.druid_table_1]], 
intervals=[[1900-01-01T00:00:00.000/3000-01-01T00:00:00.000]], groups=[{0, 1}], 
aggs=[[]])
{noformat}


was (Author: rusanu):
{{druid_basic2}} falls with exception:


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060804#comment-16060804
 ] 

Remus Rusanu commented on HIVE-16888:
-

{{druid_basic2}} falls with exception:

{noformat}
2017-06-23T05:18:31,795 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
parse.CalcitePlanner: Created Table Plan for druid_table_1 TS[0]
2017-06-23T05:18:31,795 DEBUG [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
exec.FunctionRegistry: Method didn't match: passed = [string] accepted = 
[timestamp] method = public org.apache.hadoop.hive.serde2.io.TimestampWritable 
org.apache.hadoop.hive.ql.udf.UDFDateFloor.evaluate(org.apache.hadoop.hive.serde2.io.TimestampWritable)
2017-06-23T05:18:31,796 ERROR [ecae368c-2c5d-46af-b4b3-4da8b9c0d23b main] 
parse.CalcitePlanner: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Wrong arguments 
'extract': No matching method for class 
org.apache.hadoop.hive.ql.udf.UDFDateFloorDay with (string). Possible choices: 
_FUNC_(timestamp)  
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1363)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:76) 
~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:229)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:176)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11746)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11701)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11669)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3325)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3305)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9695)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10652)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10530)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:433)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11269)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:294)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:261)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:169)
 [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060801#comment-16060801
 ] 

Remus Rusanu commented on HIVE-16888:
-

[~jcamachorodriguez] can you tell if the druid* ptests diffs are ok in the 
latest test run?

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060794#comment-16060794
 ] 

Remus Rusanu commented on HIVE-16888:
-

MiniLlapLocal/tez_smb_join fails with exception:
{noformat}
2017-06-23T05:07:55,395  INFO 
[TezTaskEventRouter{attempt_1498219646456_0001_38_01_03_0}] 
impl.LlapRecordReader: Received fragment id: 1498219646456_0001_38_01_03_0
2017-06-23T05:07:55,390  WARN 
[TezTaskEventRouter{attempt_1498219646456_0001_38_01_01_0}] 
runtime.LogicalIOProcessorRuntimeTask: Failed to handle event
java.lang.RuntimeException: java.io.IOException: java.io.IOException: 
java.io.IOException: cannot find dir = 
file:/Users/rrusanu/hive/itests/qtest/target/localfs/warehouse/tab/ds=2008-04-08/01_0
 in pathToPartitionInfo: 
[file:/Users/rrusanu/hive/itests/qtest/target/localfs/warehouse/tab_part/ds=2008-04-08]
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:157)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.(MRReaderMapred.java:76) 
~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.mapreduce.input.MultiMRInput.initFromEvent(MultiMRInput.java:195)
 ~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:154) 
~[tez-mapreduce-0.8.4.jar:0.8.4]
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:715)
 [tez-runtime-internals-0.8.4.jar:0.8.4]
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:105)
 [tez-runtime-internals-0.8.4.jar:0.8.4]
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:792)
 [tez-runtime-internals-0.8.4.jar:0.8.4]
at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35) 
[tez-common-0.8.4.jar:0.8.4]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
{noformat}
I suspect this is caused by time offsets

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16888:

Attachment: HIVE-16888.05.patch

Patch 05 user 1.13.0-RC0

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch, HIVE-16888.05.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16888:

Attachment: HIVE-16888.04.patch

patch 04 rebased to current master and updated ~50 GF with diffs only in more 
aggresiive predicate reduction ({{ key > 10 and key > 15}} -> {{key > 15}})

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060527#comment-16060527
 ] 

Remus Rusanu edited comment on HIVE-16888 at 6/23/17 7:46 AM:
--

patch 04 rebased to current master and updated ~50 GF with diffs only in more 
aggressive predicate reduction ({{key > 10 and key > 15}} -> {{key > 15}})


was (Author: rusanu):
patch 04 rebased to current master and updated ~50 GF with diffs only in more 
aggresiive predicate reduction ({{ key > 10 and key > 15}} -> {{key > 15}})

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch, HIVE-16888.04.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-22 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059708#comment-16059708
 ] 

Remus Rusanu commented on HIVE-16888:
-

I'm going through safe golden files updates (ie. better reduced predicates) and 
I'll put a new patch soon to see only more problematic diffs (result difs and 
some Tez graph diffs)

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-22 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16888:

Attachment: HIVE-16888.03.patch

Patch.03 is considering VARCHAR(HiveTypeSystemImpl.MAX_VARCHAR_PRECISION)  as 
TOK_STRING in TypeConverter.hiveToken
The Calcite 1.13 literals now come as a CAST(... AS VARCHAR()) and the existing 
type conversion only considered TOK_STRING for VARCHAR(Integer.MAX_VALUE)

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch, 
> HIVE-16888.03.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-20 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055735#comment-16055735
 ] 

Remus Rusanu commented on HIVE-16888:
-

[~jcamachorodriguez] this failure seems more serious:

{noformat}
2017-06-20T06:09:25,430 ERROR [a85d0a12-3cb2-451b-b09a-b0ee528d6af9 main] 
parse.CalcitePlanner: CBO failed, skipping CBO. 
org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: Only numeric or string 
type arguments are accepted but varchar(65535) is passed.
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFStdSample.getEvaluator(GenericUDAFStdSample.java:65)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver.getEvaluator(AbstractGenericUDAFResolver.java:47)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getGenericUDAFEvaluator(FunctionRegistry.java:982)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFEvaluator(SemanticAnalyzer.java:4640)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveGBOpConvUtil.getGBInfo(HiveGBOpConvUtil.java:269)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveGBOpConvUtil.translateGB(HiveGBOpConvUtil.java:289)
 ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
{noformat}
This is the {{string}}->{{VARCHAR(65535)}} type change having Hive side effects


> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-20 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055704#comment-16055704
 ] 

Remus Rusanu commented on HIVE-16888:
-

Another frequent diff is:
{noformat}
-  expressions: _col0 (type: string), '11' (type: string)
+  expressions: _col0 (type: string), '11' (type: varchar(65535))
{noformat}
Looks like string literals are now recognized as {{VARCHAR(65535)}}

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-20 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055648#comment-16055648
 ] 

Remus Rusanu commented on HIVE-16888:
-

The diffs I looked at seem legit, eg:
{noformat}
-  predicate: ((UDFToDouble(key) < 100.0) and (UDFToDouble(key) < 
80.0)) (type: boolean)
-  Statistics: Num rows: 55 Data size: 584 Basic stats: COMPLETE 
Column stats: NONE
+  predicate: (UDFToDouble(key) < 80.0) (type: boolean)
+  Statistics: Num rows: 166 Data size: 1763 Basic stats: COMPLETE 
Column stats: NONE
{noformat}
I'll update some golden files and do a new run

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16888:

Attachment: HIVE-16888.02.patch

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch, HIVE-16888.02.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-19 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054077#comment-16054077
 ] 

Remus Rusanu commented on HIVE-16888:
-

[~jcamachorodriguez] I will look into it asap. I was side-tracked with the 
on-call rotation stuff and did not find time to investigate these failures.

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-13 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16888 started by Remus Rusanu.
---
> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-13 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16888:

Status: Patch Available  (was: In Progress)

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-13 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16888:

Attachment: HIVE-16888.01.patch

> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16888.01.patch
>
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16888) Upgrade Calcite to 1.13 and Avatica to 1.10

2017-06-13 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-16888:
---


> Upgrade Calcite to 1.13 and Avatica to 1.10
> ---
>
> Key: HIVE-16888
> URL: https://issues.apache.org/jira/browse/HIVE-16888
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 3.0.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>
> I'm creating this early to be able to ptest the current Calcite 
> 1.13.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-06-10 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16667:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Fixed with 
https://git1-us-west.apache.org/repos/asf?p=hive.git;a=commit;h=5861b6af52839794c18f5aa686c24aabdb737b93

Note that any PostgreSQL metastore DB created between 
{{b3462503ec6cc6aebb375c30d9295e59411a4ea7}} and 
{{5861b6af52839794c18f5aa686c24aabdb737b93}} will have to be recreated as it 
contains invalid values (IDs). 

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
> Fix For: 3.0.0
>
> Attachments: HIVE-16667.2.patch, HIVE-16667.3.patch, 
> HIVE-16667.patch, HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-06-07 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041390#comment-16041390
 ] 

Remus Rusanu commented on HIVE-16667:
-

LGTM

+1

Thanks!

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
> Attachments: HIVE-16667.2.patch, HIVE-16667.3.patch, 
> HIVE-16667.patch, HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Resolved with 
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=8aee8d4f2b124fcfa093724b4de0a54287a8084f

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Fix For: 3.0.0
>
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch, 
> HIVE-16757.06.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: HIVE-16757.06.patch

Use mq.getRowCount when appropiate

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch, 
> HIVE-16757.06.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: HIVE-16757.05.patch

Patch 05. fixes omitted instance call in FilterSelectivityEstimator.getMaxNDV

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16790) HiveCostModel.JoinAlgorithm and derivatives abuse RelMetadataQuery.instance

2017-05-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-16790:
---


> HiveCostModel.JoinAlgorithm and derivatives abuse RelMetadataQuery.instance
> ---
>
> Key: HIVE-16790
> URL: https://issues.apache.org/jira/browse/HIVE-16790
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>
> Calling {{RelMetadataQuery.instance()}} has serious performance implication, 
> as it invalidates the memoization cache used in Calcite, see HIVE-16757
> {{HiveCostModel.JoinAlgorithm}} and the derivate classes abuse this calls, 
> sometimes multiple times per function. All methods in JoinAlgortihm that need 
> a RelMetadataQuery should accept it as argument, not build one instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-29 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: (was: HIVE-16757.04.patch)

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-29 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: HIVE-16757.04.patch

Fix import order, same patch

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-29 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028116#comment-16028116
 ] 

Remus Rusanu commented on HIVE-16757:
-

https://reviews.apache.org/r/59624/

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-29 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: HIVE-16757.04.patch

Patch.04 removes use of getRows from the cost model and uses 
estimateRowCount(mq) instead

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-29 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028100#comment-16028100
 ] 

Remus Rusanu commented on HIVE-16757:
-

I can't repro any of the ptest failures locally, and they seem unrelated (like 
missing tblproperties?).

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: HIVE-16757.03.patch

Patch 03 removes all 01/02 changes and instead  replaces use of deprecated 
getRows() with estimateRowCount and passes available existing RelMetadataQuery

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Summary: Use of deprecated getRows() instead of new 
estimateRowCount(RelMetadataQuery..) has serious performance impact  (was: Use 
of deprecated {{getRows()}} instead of new 
{{estimateRowCount(RelMetadataQuery..)}} has serious performance impact)

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated {{getRows()}} instead of new {{estimateRowCount(RelMetadataQuery..)}} has serious performance impact

2017-05-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Summary: Use of deprecated {{getRows()}} instead of new 
{{estimateRowCount(RelMetadataQuery..)}} has serious performance impact  (was: 
Use memoization in HiveRelMdRowCount.getRowCount)

> Use of deprecated {{getRows()}} instead of new 
> {{estimateRowCount(RelMetadataQuery..)}} has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use memoization in HiveRelMdRowCount.getRowCount

2017-05-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Description: 
Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because it 
places a new memoization cache on the stack. Hidden in the deperecated 
{{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we have 
a number of places where we're calling the deprecated {{getRows()}} instead of 
the new API {{estimateRowCount(RelMetadataQuery mq)}} which accepts the 
RelMetadataQuery, which most places we actually have it handy to pass. On 
looking at the a complex query (49 joins) there are 2995340 calls to 
{{AbstractRelNode.getRows}}, each one busting the current memoization cache 
away.


Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
times. since it does not memoize its result and the call is recursive, it 
results in an explosion of calls. for example a query with 49 joins, during 
join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
called 6442 as a top level call, but the recursivity exploded this to 501729 
calls. Memoization of the rezult would stop the recursion early. In my testing 
this reduced the join reordering time for said query from 11s to <1s..-

Note there is no need for {{HiveRelMdRowCount}} memoization because the 
function is called in stacks similar to this:
{code}
at 
org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
at GeneratedMetadataHandler_RowCount.getRowCount_$
at GeneratedMetadataHandler_RowCount.getRowCount
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
at 
org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
at 
org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
{code}
and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.

  was:
Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because it 
places a new memoization cache on the stack. Hidden in the deperecated 
{{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we have 
a number of places where we're calling the deprecated {{getRows()}} instead of 
the new API {{estimateRowCount(RelMetadataQuery mq)}} which accepts the 
RelMetadataQuery, which most places we actually have it handy to pass. On 
looking at the a complex query (49 joins) there are 2995340 calls to 
{{AbstractRelNode.getRows}}, each one busting the current memoization cache 
away.


Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
times. since it does not memoize its result and the call is recursive, it 
results in an explosion of calls. for example a query with 49 joins, during 
join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
called 6442 as a top level call, but the recursivity exploded this to 501729 
calls. Memoization of the rezult would stop the recursion early. In my testing 
this reduced the join reordering time for said query from 11s to <1s..-




> Use memoization in HiveRelMdRowCount.getRowCount
> 
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> 

[jira] [Updated] (HIVE-16757) Use memoization in HiveRelMdRowCount.getRowCount

2017-05-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Description: 
Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because it 
places a new memoization cache on the stack. Hidden in the deperecated 
{{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we have 
a number of places where we're calling the deprecated {{getRows()}} instead of 
the new API {{estimateRowCount(RelMetadataQuery mq)}} which accepts the 
RelMetadataQuery, which most places we actually have it handy to pass. On 
looking at the a complex query (49 joins) there are 2995340 calls to 
{{AbstractRelNode.getRows}}, each one busting the current memoization cache 
away.


Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
times. since it does not memoize its result and the call is recursive, it 
results in an explosion of calls. for example a query with 49 joins, during 
join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
called 6442 as a top level call, but the recursivity exploded this to 501729 
calls. Memoization of the rezult would stop the recursion early. In my testing 
this reduced the join reordering time for said query from 11s to <1s..-



  was:On complex queries HiveRelMdRowCount.getRowCount can get called many 
times. since it does not memoize its result and the call is recursive, it 
results in an explosion of calls. for example a query with 49 joins, during 
join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
called 6442 as a top level call, but the recursivity exploded this to 501729 
calls. Memoization of the rezult would stop the recursion early. In my testing 
this reduced the join reordering time for said query from 11s to <1s..


> Use memoization in HiveRelMdRowCount.getRowCount
> 
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16757) Use memoization in HiveRelMdRowCount.getRowCount

2017-05-25 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025141#comment-16025141
 ] 

Remus Rusanu commented on HIVE-16757:
-

>From what I can see, this seems to be a problem in Calcite code as well. I 
>will start a discussion on calcite-dev.

> Use memoization in HiveRelMdRowCount.getRowCount
> 
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch
>
>
> On complex queries HiveRelMdRowCount.getRowCount can get called many times. 
> since it does not memoize its result and the call is recursive, it results in 
> an explosion of calls. for example a query with 49 joins, during join 
> ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets called 
> 6442 as a top level call, but the recursivity exploded this to 501729 calls. 
> Memoization of the rezult would stop the recursion early. In my testing this 
> reduced the join reordering time for said query from 11s to <1s..



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16757) Use memoization in HiveRelMdRowCount.getRowCount

2017-05-25 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025130#comment-16025130
 ] 

Remus Rusanu commented on HIVE-16757:
-

I haven't look into it, will see

> Use memoization in HiveRelMdRowCount.getRowCount
> 
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch
>
>
> On complex queries HiveRelMdRowCount.getRowCount can get called many times. 
> since it does not memoize its result and the call is recursive, it results in 
> an explosion of calls. for example a query with 49 joins, during join 
> ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets called 
> 6442 as a top level call, but the recursivity exploded this to 501729 calls. 
> Memoization of the rezult would stop the recursion early. In my testing this 
> reduced the join reordering time for said query from 11s to <1s..



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use memoization in HiveRelMdRowCount.getRowCount

2017-05-25 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: HIVE-16757.02.patch

Patch 2 uses new HiveRelMetadataQueryProvider.get() instead of 
RelMetadataQuery.instance(), threadLocal based access to the instance.

> Use memoization in HiveRelMdRowCount.getRowCount
> 
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch
>
>
> On complex queries HiveRelMdRowCount.getRowCount can get called many times. 
> since it does not memoize its result and the call is recursive, it results in 
> an explosion of calls. for example a query with 49 joins, during join 
> ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets called 
> 6442 as a top level call, but the recursivity exploded this to 501729 calls. 
> Memoization of the rezult would stop the recursion early. In my testing this 
> reduced the join reordering time for said query from 11s to <1s..



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use memoization in HiveRelMdRowCount.getRowCount

2017-05-25 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Status: Patch Available  (was: Open)

> Use memoization in HiveRelMdRowCount.getRowCount
> 
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch
>
>
> On complex queries HiveRelMdRowCount.getRowCount can get called many times. 
> since it does not memoize its result and the call is recursive, it results in 
> an explosion of calls. for example a query with 49 joins, during join 
> ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets called 
> 6442 as a top level call, but the recursivity exploded this to 501729 calls. 
> Memoization of the rezult would stop the recursion early. In my testing this 
> reduced the join reordering time for said query from 11s to <1s..



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use memoization in HiveRelMdRowCount.getRowCount

2017-05-25 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: HIVE-16757.01.patch

Patch 01 for running the tests 

> Use memoization in HiveRelMdRowCount.getRowCount
> 
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch
>
>
> On complex queries HiveRelMdRowCount.getRowCount can get called many times. 
> since it does not memoize its result and the call is recursive, it results in 
> an explosion of calls. for example a query with 49 joins, during join 
> ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets called 
> 6442 as a top level call, but the recursivity exploded this to 501729 calls. 
> Memoization of the rezult would stop the recursion early. In my testing this 
> reduced the join reordering time for said query from 11s to <1s..



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16757) Use memoization in HiveRelMdRowCount.getRowCount

2017-05-25 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-16757:
---


> Use memoization in HiveRelMdRowCount.getRowCount
> 
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>
> On complex queries HiveRelMdRowCount.getRowCount can get called many times. 
> since it does not memoize its result and the call is recursive, it results in 
> an explosion of calls. for example a query with 49 joins, during join 
> ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets called 
> 6442 as a top level call, but the recursivity exploded this to 501729 calls. 
> Memoization of the rezult would stop the recursion early. In my testing this 
> reduced the join reordering time for said query from 11s to <1s..



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-23 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020759#comment-16020759
 ] 

Remus Rusanu commented on HIVE-16667:
-

I don't see how your patch will address the problem of mixing  pre v3.0 
upgraded tables and post v.30 created one. the new ones will store the 
{{TYPE_NAME}} as a LOB handle, while the upgraded ones will have the inlined 
string value.

Can we explore instead reverting the column type mapping to {{VARCHAR}} in the 
package.jdo, (or {{LONGVARCHAR}} if need be) instead of {{CLOB}}? Keep the 
metastore upgrade scripts, change the underlying storage tyopes to respective 
large types, but have JDO map them to String, not Clob. On my testing with PG, 
this works correctly on all cases I tested (my repro, your large serde DDL from 
HIVE-12274), and should handle upgrade correctly. But I did not test this with 
other metastore engines, Oracle, MySQL etc.

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
> Attachments: HIVE-16667.patch, HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16113:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Resolved via 
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=5f4eaa9b13e7beec8bb16fea94fec386e2bc1e00

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Fix For: 3.0.0
>
> Attachments: HIVE-16113.1.patch, HIVE-16113.2.patch, 
> HIVE-16113.3.patch, HIVE-16113.4.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16113:

Attachment: HIVE-16113.4.patch

PAtch.4 add the explainuser_3/analyze_3 GF diff (altough they're gen 88).

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Attachments: HIVE-16113.1.patch, HIVE-16113.2.patch, 
> HIVE-16113.3.patch, HIVE-16113.4.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-20 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16113:

Attachment: HIVE-16113.3.patch

Patch.3 rebased to current master for new test run

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Attachments: HIVE-16113.1.patch, HIVE-16113.2.patch, 
> HIVE-16113.3.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-18 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015957#comment-16015957
 ] 

Remus Rusanu commented on HIVE-16667:
-

While debugging this I found that there is a relevant JDO driver setting: 
{{DatastoreAdapter.CLOB_SET_USING_SETSTRING}}, that controls this, see 
[ClobRDBMSMapping.setString|https://github.com/datanucleus/datanucleus-rdbms/blob/master/src/main/java/org/datanucleus/store/rdbms/mapping/datastore/ClobRDBMSMapping.java#L60]:
{code}
public void setString(PreparedStatement ps, int param, String value)
{
if 
(getDatastoreAdapter().supportsOption(DatastoreAdapter.CLOB_SET_USING_SETSTRING))
{
super.setString(ps, param ,value);
}
else
{
setObject(ps, param, value);
}
}
...
public String getString(ResultSet rs, int param)
{
if 
(getDatastoreAdapter().supportsOption(DatastoreAdapter.CLOB_SET_USING_SETSTRING))
{
return super.getString(rs, param);
}
return (String) getObject(rs, param);
}
{code}

However, I could not find any way to *configure* this. It is pre-set for [MySQL 
Adapter|https://github.com/datanucleus/datanucleus-rdbms/blob/master/src/main/java/org/datanucleus/store/rdbms/adapter/MySQLAdapter.java#L119],
 but not for PG. I don't know if the connection URL/string can somehow set this 
preference/setting. My experience with Datanucleus is rather limited.

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
> Attachments: HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-18 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015775#comment-16015775
 ] 

Remus Rusanu commented on HIVE-16667:
-

Can you confirm with a debugger that that line is being executed? I want to 
figure out whether is a different Hive setting that causes this code not to run 
in your case, or the line is being executed but behave differently (loads the 
lobs as strings), which would point toward a difference in Env/PG/JDO/Driver.

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
> Attachments: HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-18 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015291#comment-16015291
 ] 

Remus Rusanu edited comment on HIVE-16667 at 5/18/17 6:39 AM:
--

[~ngangam] Thanks! I Can you please run the repro I attached originally?
{code}
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO TABLE 
srcpart PARTITION (ds="2008-04-09", hr="11");
select * from srcpart;
{code}
There is a reason for this specific repro. If simply look at any CLOB field, 
like {{TABLE_PARAMS.PARAM_VALUE}}, then this field may well be loaded by JDO, 
via the ObjectStore. JDO knows how to handle this field appropriately. But my 
repro triggers a code path which goes through the 
[MetasoreDirectSql.getPartitionsFromPartitionIds|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L787]:
{code}
   // Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
  // We are skipping the CDS table here, as it seems to be totally useless.
  queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\", 
\"TYPE_NAME\""
  + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
  + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
  loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc() {
@Override
public void apply(List t, Object[] fields) {
  t.add(new FieldSchema((String)fields[2], extractSqlClob(fields[3]), 
(String)fields[1]));
}});
}
{code}
This particular code is the one I'm reporting the problem on. For me, this does 
not handle Clobs appropriately and reads the lob handle value instead of the 
lob content.




was (Author: rusanu):
[~ngangam] Thanks! I Can you please run the repro I attached originally?
{code}
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO TABLE 
srcpart PARTITION (ds="2008-04-09", hr="11");
select * from srcpart;
{code}
There is a reason for this specific repro. If simply look at any CLOB field, 
like {{TABLE_PARAMS.PARAM_VALUE}}, then this field may well be loaded by JDO, 
via the ObjectStore. JDO knows how to handle this field appropriately. But my 
repro triggers a code path which goes through the 
[{{MetasoreDirectSql.getPartitionsFromPartitionIds}}](https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L787):
{code}
   // Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
  // We are skipping the CDS table here, as it seems to be totally useless.
  queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\", 
\"TYPE_NAME\""
  + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
  + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
  loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc() {
@Override
public void apply(List t, Object[] fields) {
  t.add(new FieldSchema((String)fields[2], extractSqlClob(fields[3]), 
(String)fields[1]));
}});
}
{code}
This particular code is the one I'm reporting the problem on. For me, this does 
not handle Clobs appropriately and reads the lob handle value instead of the 
lob content.



> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
> Attachments: HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> 

[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-18 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16015291#comment-16015291
 ] 

Remus Rusanu commented on HIVE-16667:
-

[~ngangam] Thanks! I Can you please run the repro I attached originally?
{code}
CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO TABLE 
srcpart PARTITION (ds="2008-04-09", hr="11");
select * from srcpart;
{code}
There is a reason for this specific repro. If simply look at any CLOB field, 
like {{TABLE_PARAMS.PARAM_VALUE}}, then this field may well be loaded by JDO, 
via the ObjectStore. JDO knows how to handle this field appropriately. But my 
repro triggers a code path which goes through the 
[{{MetasoreDirectSql.getPartitionsFromPartitionIds}}](https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L787):
{code}
   // Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
  // We are skipping the CDS table here, as it seems to be totally useless.
  queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\", 
\"TYPE_NAME\""
  + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
  + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
  loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc() {
@Override
public void apply(List t, Object[] fields) {
  t.add(new FieldSchema((String)fields[2], extractSqlClob(fields[3]), 
(String)fields[1]));
}});
}
{code}
This particular code is the one I'm reporting the problem on. For me, this does 
not handle Clobs appropriately and reads the lob handle value instead of the 
lob content.



> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
> Attachments: HiveCLIOutput.txt, PostgresDBOutput.txt
>
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-17 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014383#comment-16014383
 ] 

Remus Rusanu commented on HIVE-16667:
-

BTW, to show {{PARAM_VALUE}} value in PG:
{code}
metastore=# select "PARAM_VALUE", lo_get(cast("PARAM_VALUE" as INT)) from 
"TABLE_PARAMS" limit 3;
 PARAM_VALUE | lo_get
-+
 21665   | \x31343934333230383636
 21742   | \x30
 21743   | \x30
(3 rows)
{code}

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-17 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014370#comment-16014370
 ] 

Remus Rusanu commented on HIVE-16667:
-

For me it happens with even tiny strings, like {{INT}}:
{code}
metastore=# select *, CAST(lo_get(CAST("TYPE_NAME" as bigint)) as TEXT) from 
"COLUMNS_V2"  LIMIT 1;
CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |  lo_get
---+-+-+---+-+--
 2 | | customer| 21664 |   0 | \x696e74
{code}

Can you tell me what JDBC Driver do you use? My settings are:
{code}

javax.jdo.option.ConnectionURL
jdbc:postgresql://localhost:5432/metastore


javax.jdo.option.ConnectionDriverName
org.postgresql.Driver

{code}
I think the classpath resolves the driver to {{postgresql-9.3-1102-jdbc3.jar}}. 
The PG server itself is 9.6.2:
{code}
rrusanu=# select version();
version

 PostgreSQL 9.6.2 on x86_64-apple-darwin15.6.0, compiled by Apple LLVM version 
8.0.0 (clang-800.0.42.1), 64-bit
(1 row)
{code}



> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-17 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014204#comment-16014204
 ] 

Remus Rusanu commented on HIVE-16667:
-

What PG and PG client version do you have?

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-16 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012594#comment-16012594
 ] 

Remus Rusanu commented on HIVE-16667:
-

[~ngangam] I think the easiest is to set up a local PG metastore, inititalize 
it with schematool, then run the repro I attached.

Like you, I would have expected the value to come as Clob, but is String in the 
debugger.

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16667) PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and other field is incorrect

2017-05-16 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012469#comment-16012469
 ] 

Remus Rusanu commented on HIVE-16667:
-

The problem will not reproduce on Hive tables that were present at metastore 
upgrade. It must be a (partitioned?) table, created after the upgrade.

> PostgreSQL metastore handling of CLOB types for COLUMNS_V2.TYPE_NAME and 
> other field is incorrect
> -
>
> Key: HIVE-16667
> URL: https://issues.apache.org/jira/browse/HIVE-16667
> Project: Hive
>  Issue Type: Bug
>Reporter: Remus Rusanu
>Assignee: Naveen Gangam
>
> The CLOB JDO type introduced with HIVE-12274 does not work correctly with 
> PostgreSQL. The value is written out-of-band and the LOB handle is written,as 
> an INT, into the table. SELECTs return the INT value, which should had been 
> read via the {{lo_get}} PG built-in, and then cast into string.
> Furthermore, the behavior is different between fields upgraded from earlier 
> metastore versions (they retain their string storage) vs. values inserted 
> after the upgrade (inserted as LOB roots).
> Teh code in 
> {{MetasoreDirectSql.getPartitionsFromPartitionIds/extractSqlClob}} expects 
> the underlying JDO/Datanucleus to map the column to a {{Clob}} but that does 
> not happen, the value is a Java String containing the int which is the LOB 
> root saved by PG.
> This manifests at runtime with errors like:
> {code}
> hive> select * from srcpart;
> Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
> Error: type expected at the position 0 of '24030:24031' but '24030' is found.
> {code}
> the 24030:24031 should be 'string:string'.
> repro:
> {code}
> CREATE TABLE srcpart (key STRING COMMENT 'default', value STRING COMMENT 
> 'default') PARTITIONED BY (ds STRING, hr STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH "${hiveconf:test.data.dir}/kv1.txt" OVERWRITE INTO 
> TABLE srcpart PARTITION (ds="2008-04-09", hr="11");
> select * from srcpart;
> {code}
> I did not see the issue being hit by non-partitioned/textfile tables, but 
> that is just the luck of the path taken by the code. Inspection of my PG 
> metastore shows all the CLOB fields suffering from this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12274) Increase width of columns used for general configuration in the metastore.

2017-05-13 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009478#comment-16009478
 ] 

Remus Rusanu commented on HIVE-12274:
-

BTW there is a {{DatastoreAdapter.CLOB_SET_USING_SETSTRING}} which would change 
the behavior, but I'm not sure if it can be configured from environment.

> Increase width of columns used for general configuration in the metastore.
> --
>
> Key: HIVE-12274
> URL: https://issues.apache.org/jira/browse/HIVE-12274
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Naveen Gangam
>  Labels: metastore
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-12274.2.patch, HIVE-12274.3.patch, 
> HIVE-12274.4.patch, HIVE-12274.5.patch, HIVE-12274.example.ddl.hql, 
> HIVE-12274.patch
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit 
> when processing JSON data that has a large nested schema. The struct 
> definition is truncated when inserted into the metastore database column 
> {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined 
> configuration values it seems rather limiting to impose such a relatively low 
> length bound. One can imagine that valid use cases will arise where 
> reasonable parameter/property values exceed the current limit. 
> h2. Context
> These limitations were in by the [patch 
> attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
>  to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the 
> reason. However, nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the 
> configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. 
> ([source|http://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623])
> * Postgres supports a max of 1GB for {{character}} datatype. 
> ([source|http://www.postgresql.org/docs/8.3/static/datatype-character.html])
> * MySQL can support upto 65535 bytes for the entire row. So long as the 
> {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. 
> ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using 
> "varchar(max)" with the same limitation as MySQL being 65535 bytes for the 
> entire row. ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * Derby's {{varchar}} can be upto 32672 bytes. 
> ([source|https://db.apache.org/derby/docs/10.7/ref/rrefsqlj41207.html])
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by 
> {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents 
> exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> After updating the maximum length the metastore database needs to be 
> configured and restarted with the new settings. Altering {{MAX_STRING_SIZE}} 
> will update database objects and possibly invalidate them, as follows:
> * Tables with virtual columns will be updated with new data type metadata for 
> virtual columns of {{VARCHAR2(4000)}}, 4000-byte {{NVARCHAR2}}, or 
> {{RAW(2000)}} type.
> * Functional indexes will become unusable if a change to their associated 
> virtual columns causes the index key to exceed index key length limits. 
> Attempts to rebuild such indexes will fail with {{ORA-01450: maximum key 
> length exceeded}}.
> * Views will be invalidated if they contain {{VARCHAR2(4000)}}, 4000-byte 
> {{NVARCHAR2}}, or {{RAW(2000)}} typed expression columns.
> * Materialized views will be updated with new metadata {{VARCHAR2(4000)}}, 
> 4000-byte {{NVARCHAR2}}, and {{RAW(2000)}} typed expression columns
> * So the limitation could be raised to 32672 bytes, with the caveat that 
> MySQL and SQL Server limit the row length to 65535 bytes, so that should also 
> be validated to provide consistency.
> Finally, will this limitation persist in the work resulting from HIVE-9452?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12274) Increase width of columns used for general configuration in the metastore.

2017-05-13 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009447#comment-16009447
 ] 

Remus Rusanu commented on HIVE-12274:
-

[~ngangam] I'm having problems with PostgreSQL metastore after these changes. 
The CLOB fields is saved by the PG driver as an int handle (see 
https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgPreparedStatement.javaL1239):

{code}
metastore=# select *, CAST(lo_get(CAST("TYPE_NAME" as bigint)) as TEXT) from 
"COLUMNS_V2"  LIMIT 1;
 CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |  lo_get
---+-+-+---+-+--
 2 | | customer| 21664 |   0 | \x696e74
(1 row)

metastore=# select version();
version

 PostgreSQL 9.6.2 on x86_64-apple-darwin15.6.0, compiled by Apple LLVM version 
8.0.0 (clang-800.0.42.1), 64-bit
{code}

This causes runtime failures on a partitioned table:

{code}
hive> select * from srcpart;
Failed with exception java.io.IOException:java.lang.IllegalArgumentException: 
Error: type expected at the position 0 of '24030:24031' but '24030' is found.
{code}

the {{24030:24031}} should be {{string:string}} but is not translated through 
lo_get by {{MetasoreDirectSql.getPartitionsFromPartitionIds}}

{code}
// Get FieldSchema stuff if any.
if (!colss.isEmpty()) {
  // We are skipping the CDS table here, as it seems to be totally useless.
  queryText = "select \"CD_ID\", \"COMMENT\", \"COLUMN_NAME\", 
\"TYPE_NAME\""
  + " from \"COLUMNS_V2\" where \"CD_ID\" in (" + colIds + ") and 
\"INTEGER_IDX\" >= 0"
  + " order by \"CD_ID\" asc, \"INTEGER_IDX\" asc";
  loopJoinOrderedResult(colss, queryText, 0, new 
ApplyFunc() {
@Override
public void apply(List t, Object[] fields) {
  t.add(new FieldSchema((String)fields[2], extractSqlClob(fields[3]), 
(String)fields[1]));
}});
{code}

> Increase width of columns used for general configuration in the metastore.
> --
>
> Key: HIVE-12274
> URL: https://issues.apache.org/jira/browse/HIVE-12274
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Naveen Gangam
>  Labels: metastore
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-12274.2.patch, HIVE-12274.3.patch, 
> HIVE-12274.4.patch, HIVE-12274.5.patch, HIVE-12274.example.ddl.hql, 
> HIVE-12274.patch
>
>
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit 
> when processing JSON data that has a large nested schema. The struct 
> definition is truncated when inserted into the metastore database column 
> {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined 
> configuration values it seems rather limiting to impose such a relatively low 
> length bound. One can imagine that valid use cases will arise where 
> reasonable parameter/property values exceed the current limit. 
> h2. Context
> These limitations were in by the [patch 
> attributed|https://github.com/apache/hive/commit/c21a526b0a752df2a51d20a2729cc8493c228799]
>  to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the 
> reason. However, nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the 
> configuration parameter {{MAX_STRING_SIZE}} to {{EXTENDED}}. 
> ([source|http://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF55623])
> * Postgres supports a max of 1GB for {{character}} datatype. 
> ([source|http://www.postgresql.org/docs/8.3/static/datatype-character.html])
> * MySQL can support upto 65535 bytes for the entire row. So long as the 
> {{PARAM_KEY}} value + {{PARAM_VALUE}} is less than 65535, we should be good. 
> ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using 
> "varchar(max)" with the same limitation as MySQL being 65535 bytes for the 
> entire row. ([source|http://dev.mysql.com/doc/refman/5.0/en/char.html])
> * Derby's {{varchar}} can be upto 32672 bytes. 
> ([source|https://db.apache.org/derby/docs/10.7/ref/rrefsqlj41207.html])
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by 
> {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents 
> exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}

[jira] [Updated] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-12 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16113:

Attachment: HIVE-16113.2.patch

Patch 2 makes any function that has any argument NULL transform into NULL, for 
the purpose of partition pruning logic. Only {{partcol AND NULL}} is replaced 
with {{partcoll AND TRUE}}.

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Attachments: HIVE-16113.1.patch, HIVE-16113.2.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16612) PerfLogger is configurable, but not extensible

2017-05-10 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16612:

Attachment: HIVE-16612.02.patch

Patch 02 adds query text and plan capture, needed for DB save

> PerfLogger is configurable, but not extensible
> --
>
> Key: HIVE-16612
> URL: https://issues.apache.org/jira/browse/HIVE-16612
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16612.01.patch, HIVE-16612.02.patch
>
>
> {code}
>   result = (PerfLogger) 
> ReflectionUtils.newInstance(conf.getClassByName(
> conf.getVar(HiveConf.ConfVars.HIVE_PERF_LOGGER)), conf);
> {code}
> The PerfLogger instance is configurable via {{hive.exec.perf.logger}} 
> (HIVE-11891) but the requirement to extend from {{PerfLogger}} cannot be met 
> since HIVE-11149 as the ctor is private. Also useful methods in PerfLogger 
> are also private. I tried to extend PerfLogger for my needs and realized 
> that, as is, the configurability is not usable. At the very least the 
> PerfLogger should make all private members {{protected}}, better the 
> requirement should be an interface not a class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16612) PerfLogger is configurable, but not extensible

2017-05-10 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16612:

Attachment: HIVE-16612.01.patch

Patch 01 amend to change default hive conf for perf logger class and catch 
RuntimeException ('NoSuchMethod')

> PerfLogger is configurable, but not extensible
> --
>
> Key: HIVE-16612
> URL: https://issues.apache.org/jira/browse/HIVE-16612
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16612.01.patch
>
>
> {code}
>   result = (PerfLogger) 
> ReflectionUtils.newInstance(conf.getClassByName(
> conf.getVar(HiveConf.ConfVars.HIVE_PERF_LOGGER)), conf);
> {code}
> The PerfLogger instance is configurable via {{hive.exec.perf.logger}} 
> (HIVE-11891) but the requirement to extend from {{PerfLogger}} cannot be met 
> since HIVE-11149 as the ctor is private. Also useful methods in PerfLogger 
> are also private. I tried to extend PerfLogger for my needs and realized 
> that, as is, the configurability is not usable. At the very least the 
> PerfLogger should make all private members {{protected}}, better the 
> requirement should be an interface not a class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16612) PerfLogger is configurable, but not extensible

2017-05-10 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16612:

Attachment: (was: HIVE-16612.01.patch)

> PerfLogger is configurable, but not extensible
> --
>
> Key: HIVE-16612
> URL: https://issues.apache.org/jira/browse/HIVE-16612
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
> Fix For: 3.0.0
>
>
> {code}
>   result = (PerfLogger) 
> ReflectionUtils.newInstance(conf.getClassByName(
> conf.getVar(HiveConf.ConfVars.HIVE_PERF_LOGGER)), conf);
> {code}
> The PerfLogger instance is configurable via {{hive.exec.perf.logger}} 
> (HIVE-11891) but the requirement to extend from {{PerfLogger}} cannot be met 
> since HIVE-11149 as the ctor is private. Also useful methods in PerfLogger 
> are also private. I tried to extend PerfLogger for my needs and realized 
> that, as is, the configurability is not usable. At the very least the 
> PerfLogger should make all private members {{protected}}, better the 
> requirement should be an interface not a class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16612) PerfLogger is configurable, but not extensible

2017-05-10 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16612:

Attachment: HIVE-16612.01.patch

Patch 01 refactors the existing code:

 - {{PerfLogger}} is the interface, exposing all the methods used by various 
logger clients. Replaced references to use the interface. 
{{SessionState.getPerfLogger}} returns the interface etc.
 - {{PerfLoggerFactory}} is the factory for returning the current logger
 - {{PerfLoggerImpl}} is the existing implementation. Only referenced by Factory
 - {{PerfLoggerTokens}} is a static class for all the various strings used by 
logging clients: COMPILE, OPTIMIZER etc

Most changes are the cosmetic changes needed to import/reference 
{{PerfLoggerTokens}}.

> PerfLogger is configurable, but not extensible
> --
>
> Key: HIVE-16612
> URL: https://issues.apache.org/jira/browse/HIVE-16612
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16612.01.patch
>
>
> {code}
>   result = (PerfLogger) 
> ReflectionUtils.newInstance(conf.getClassByName(
> conf.getVar(HiveConf.ConfVars.HIVE_PERF_LOGGER)), conf);
> {code}
> The PerfLogger instance is configurable via {{hive.exec.perf.logger}} 
> (HIVE-11891) but the requirement to extend from {{PerfLogger}} cannot be met 
> since HIVE-11149 as the ctor is private. Also useful methods in PerfLogger 
> are also private. I tried to extend PerfLogger for my needs and realized 
> that, as is, the configurability is not usable. At the very least the 
> PerfLogger should make all private members {{protected}}, better the 
> requirement should be an interface not a class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16612) PerfLogger is configurable, but not extensible

2017-05-10 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16612:

Fix Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> PerfLogger is configurable, but not extensible
> --
>
> Key: HIVE-16612
> URL: https://issues.apache.org/jira/browse/HIVE-16612
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16612.01.patch
>
>
> {code}
>   result = (PerfLogger) 
> ReflectionUtils.newInstance(conf.getClassByName(
> conf.getVar(HiveConf.ConfVars.HIVE_PERF_LOGGER)), conf);
> {code}
> The PerfLogger instance is configurable via {{hive.exec.perf.logger}} 
> (HIVE-11891) but the requirement to extend from {{PerfLogger}} cannot be met 
> since HIVE-11149 as the ctor is private. Also useful methods in PerfLogger 
> are also private. I tried to extend PerfLogger for my needs and realized 
> that, as is, the configurability is not usable. At the very least the 
> PerfLogger should make all private members {{protected}}, better the 
> requirement should be an interface not a class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-09 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003300#comment-16003300
 ] 

Remus Rusanu commented on HIVE-16113:
-

I don't think we should handle OR in removeNonPartCols, leaving it as NULL to 
bubble behaves correctly. I made that change and tested it, all good. 

But the NVL/COALESCE/CASE is trickier. I can write a filter like
{code}
where colpart=1 or col < 5
{code}
and works correct, but wrap it in an NVL:
{code}
where NVL(colpart=1 or col < 5, false)
{code}
and the result is no longer correct because the NULL is being replaced  with 
FALSE and this causes overaggressive partition pruning. My plan is to have 
these UDFs (NVL/COALESCE) replaced with NULL in the pruning expr tree.

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Attachments: HIVE-16113.1.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-09 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003117#comment-16003117
 ] 

Remus Rusanu commented on HIVE-16113:
-

{{OR}}: the partition pruner is reducing the expression to {{partcol = 1}} and 
this reduces the list of partitions retrieved from MD:
{code}
ppList = getPartitionsFromServer(tab, (ExprNodeGenericFuncDesc)compactExpr, 
conf, alias, partColsUsedInFilter, 
oldFilter.equals(compactExpr.getExprString()));
{code}
By now things are already broken, but later the reduced partition list causes 
the {{PcrExprProcFactory.GenericFuncExprProcessor.handleUdfOr}} to see a 
definite TRUE instead of a correct DIVIDE (since the filter expression is only 
evaluated against a reduced list of partitions) and thus cause the 
{{PartitionConditionRemover}} to remove the entire Filter. 

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Attachments: HIVE-16113.1.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-09 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16113:

Comment: was deleted

(was: As for the {{OR}} case, I think that is being removed because 
{{PcrExprProcFactory.GenericFuncExprProcessor.handleUdfOr}} shortcuts the case 
{{WalkState.TRUE}} over the UNKNOWN of the {{col<5}} case. This determinism 
later makes {{PartitionConditionRemover}} eliminate the Filter operator 
completely.)

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Attachments: HIVE-16113.1.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-09 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002855#comment-16002855
 ] 

Remus Rusanu commented on HIVE-16113:
-

As for the {{OR}} case, I think that is being removed because 
{{PcrExprProcFactory.GenericFuncExprProcessor.handleUdfOr}} shortcuts the case 
{{WalkState.TRUE}} over the UNKNOWN of the {{col<5}} case. This determinism 
later makes {{PartitionConditionRemover}} eliminate the Filter operator 
completely.

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Attachments: HIVE-16113.1.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-09 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16002677#comment-16002677
 ] 

Remus Rusanu commented on HIVE-16113:
-

I think the issue is that the partition pruning logic expects {{null}} values 
to percolate through the expression tree and trigger the {{isUnknown}} case in 
{{PartitionPrunner.prunePartitionNames}}. But expressions like CASE, COALESCE 
or NVL (maybe other?) stop null bubbling and can evaluate the tree to a 
resolute {{false}} instead, causing overaggressive partition elimination. If 
the expression has these functions (NVL, COALESCE) to handle {{null}} values at 
a *row* level we're evaluating them and taking a decision at *partition* level 
(prune/not prune), and the 'prune' case is not safe. Perhaps we should consider 
functions like NVL/COALESCE 'special' and inject 'true' instead in the pruning 
expression? 

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Attachments: HIVE-16113.1.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16612) PerfLogger is configurable, but not extensible

2017-05-08 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-16612:
---


> PerfLogger is configurable, but not extensible
> --
>
> Key: HIVE-16612
> URL: https://issues.apache.org/jira/browse/HIVE-16612
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>
> {code}
>   result = (PerfLogger) 
> ReflectionUtils.newInstance(conf.getClassByName(
> conf.getVar(HiveConf.ConfVars.HIVE_PERF_LOGGER)), conf);
> {code}
> The PerfLogger instance is configurable via {{hive.exec.perf.logger}} 
> (HIVE-11891) but the requirement to extend from {{PerfLogger}} cannot be met 
> since HIVE-11149 as the ctor is private. Also useful methods in PerfLogger 
> are also private. I tried to extend PerfLogger for my needs and realized 
> that, as is, the configurability is not usable. At the very least the 
> PerfLogger should make all private members {{protected}}, better the 
> requirement should be an interface not a class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-05-08 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-16113:
---

Assignee: Remus Rusanu  (was: Gopal V)

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.1.1, 2.2.0
>Reporter: Gopal V
>Assignee: Remus Rusanu
> Attachments: HIVE-16113.1.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14431) Recognize COALESCE as CASE

2017-05-02 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-14431:

Attachment: HIVE-14431.03.patch

> Recognize COALESCE as CASE
> --
>
> Key: HIVE-14431
> URL: https://issues.apache.org/jira/browse/HIVE-14431
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Remus Rusanu
> Attachments: HIVE-14431.01.patch, HIVE-14431.03.patch, 
> HIVE-14431.2.patch, HIVE-14431.patch
>
>
> Transform:
> {code}
> (COALESCE(a, '')  = '') OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> into:
> {code}
> (a='') OR
>(a is null) OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> With complex queries, this will lead us to factor more predicates that could 
> be pushed to the TS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-14431) Recognize COALESCE as CASE

2017-05-02 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992822#comment-15992822
 ] 

Remus Rusanu edited comment on HIVE-14431 at 5/2/17 12:48 PM:
--

Not all changes are in RexSimplify. The 3rd case you added
{code}
+  // 3) Another simplification
+  //   CASE
+  //   WHEN p1 THEN x
+  //   WHEN p2 THEN y
+  //   ELSE TRUE
+  //   END
{code} 
is not currently in Calcite. I made the changes in Hive, but the CASE is left 
as a CASE, not transformed into an OR.


was (Author: rusanu):
@jcamachorodriguez Not all changes are in RexSimplify. The 3rd case you added
{code}
+  // 3) Another simplification
+  //   CASE
+  //   WHEN p1 THEN x
+  //   WHEN p2 THEN y
+  //   ELSE TRUE
+  //   END
{code} 
is not currently in Calcite. I made the changes in Hive, but the CASE is left 
as a CASE, not transformed into an OR.

> Recognize COALESCE as CASE
> --
>
> Key: HIVE-14431
> URL: https://issues.apache.org/jira/browse/HIVE-14431
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Remus Rusanu
> Attachments: HIVE-14431.01.patch, HIVE-14431.03.patch, 
> HIVE-14431.2.patch, HIVE-14431.patch
>
>
> Transform:
> {code}
> (COALESCE(a, '')  = '') OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> into:
> {code}
> (a='') OR
>(a is null) OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> With complex queries, this will lead us to factor more predicates that could 
> be pushed to the TS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14431) Recognize COALESCE as CASE

2017-05-02 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992822#comment-15992822
 ] 

Remus Rusanu commented on HIVE-14431:
-

@jcamachorodriguez Not all changes are in RexSimplify. The 3rd case you added
{code}
+  // 3) Another simplification
+  //   CASE
+  //   WHEN p1 THEN x
+  //   WHEN p2 THEN y
+  //   ELSE TRUE
+  //   END
{code} 
is not currently in Calcite. I made the changes in Hive, but the CASE is left 
as a CASE, not transformed into an OR.

> Recognize COALESCE as CASE
> --
>
> Key: HIVE-14431
> URL: https://issues.apache.org/jira/browse/HIVE-14431
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Remus Rusanu
> Attachments: HIVE-14431.01.patch, HIVE-14431.2.patch, HIVE-14431.patch
>
>
> Transform:
> {code}
> (COALESCE(a, '')  = '') OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> into:
> {code}
> (a='') OR
>(a is null) OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> With complex queries, this will lead us to factor more predicates that could 
> be pushed to the TS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-13811) Constant not removed in index_auto_unused.q.out

2017-05-01 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990336#comment-15990336
 ] 

Remus Rusanu edited comment on HIVE-13811 at 5/1/17 11:34 AM:
--

Original plant (before HIVE-13068):
{noformat}
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
-TableScan
-  alias: srcpart
-  filterExpr: (UDFToDouble(key) < 10.0) (type: boolean)
-  Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
Column stats: NONE
-  Filter Operator
-predicate: (UDFToDouble(key) < 10.0) (type: boolean)
-Statistics: Num rows: 166 Data size: 1763 Basic stats: COMPLETE 
Column stats: NONE
-Select Operator
-  expressions: key (type: string), value (type: string), 
'2008-04-09' (type: string), hr (type: string)
-  outputColumnNames: _col0, _col1, _col2, _col3
-  Statistics: Num rows: 166 Data size: 1763 Basic stats: COMPLETE 
Column stats: NONE
-  ListSink
{noformat}


was (Author: rusanu):
Original plant (before HIVE-13608):
{noformat}
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
-TableScan
-  alias: srcpart
-  filterExpr: (UDFToDouble(key) < 10.0) (type: boolean)
-  Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
Column stats: NONE
-  Filter Operator
-predicate: (UDFToDouble(key) < 10.0) (type: boolean)
-Statistics: Num rows: 166 Data size: 1763 Basic stats: COMPLETE 
Column stats: NONE
-Select Operator
-  expressions: key (type: string), value (type: string), 
'2008-04-09' (type: string), hr (type: string)
-  outputColumnNames: _col0, _col1, _col2, _col3
-  Statistics: Num rows: 166 Data size: 1763 Basic stats: COMPLETE 
Column stats: NONE
-  ListSink
{noformat}

> Constant not removed in index_auto_unused.q.out
> ---
>
> Key: HIVE-13811
> URL: https://issues.apache.org/jira/browse/HIVE-13811
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Remus Rusanu
>
> Follow-up on HIVE-13068.
> In test file ql/src/test/results/clientpositive/index_auto_unused.q.out.
> After HIVE-13068 goes in, the following filter is not folded after 
> PartitionPruning is done:
> {{filterExpr: ((ds = '2008-04-09') and (12.0 = 12.0) and (UDFToDouble(key) < 
> 10.0)) (type: boolean)}}
> Further, SimpleFetchOptimizer got disabled.
> All this needs further investigation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-05-01 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990655#comment-15990655
 ] 

Remus Rusanu commented on HIVE-16527:
-

I think it needs to be documented. The full feature should cover  ANSI SQL2011 
Feature F442, “Mixed column references in set functions”, but this fix 
addresses only a part of the problem.

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Fix For: 3.0.0
>
> Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch, 
> HIVE-16527.03.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13811) Constant not removed in index_auto_unused.q.out

2017-04-30 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990336#comment-15990336
 ] 

Remus Rusanu commented on HIVE-13811:
-

Original plant (before HIVE-13608):
{noformat}
   Stage: Stage-0
 Fetch Operator
   limit: -1
   Processor Tree:
-TableScan
-  alias: srcpart
-  filterExpr: (UDFToDouble(key) < 10.0) (type: boolean)
-  Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE 
Column stats: NONE
-  Filter Operator
-predicate: (UDFToDouble(key) < 10.0) (type: boolean)
-Statistics: Num rows: 166 Data size: 1763 Basic stats: COMPLETE 
Column stats: NONE
-Select Operator
-  expressions: key (type: string), value (type: string), 
'2008-04-09' (type: string), hr (type: string)
-  outputColumnNames: _col0, _col1, _col2, _col3
-  Statistics: Num rows: 166 Data size: 1763 Basic stats: COMPLETE 
Column stats: NONE
-  ListSink
{noformat}

> Constant not removed in index_auto_unused.q.out
> ---
>
> Key: HIVE-13811
> URL: https://issues.apache.org/jira/browse/HIVE-13811
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Remus Rusanu
>
> Follow-up on HIVE-13068.
> In test file ql/src/test/results/clientpositive/index_auto_unused.q.out.
> After HIVE-13068 goes in, the following filter is not folded after 
> PartitionPruning is done:
> {{filterExpr: ((ds = '2008-04-09') and (12.0 = 12.0) and (UDFToDouble(key) < 
> 10.0)) (type: boolean)}}
> Further, SimpleFetchOptimizer got disabled.
> All this needs further investigation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16527:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

https://git1-us-west.apache.org/repos/asf?p=hive.git;a=commit;h=dac3786d86462e4d08d62d23115e6b7a3e534f5d

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Fix For: 3.0.0
>
> Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch, 
> HIVE-16527.03.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14431) Recognize COALESCE as CASE

2017-04-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-14431:

Status: Open  (was: Patch Available)

> Recognize COALESCE as CASE
> --
>
> Key: HIVE-14431
> URL: https://issues.apache.org/jira/browse/HIVE-14431
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Remus Rusanu
> Attachments: HIVE-14431.01.patch, HIVE-14431.2.patch, HIVE-14431.patch
>
>
> Transform:
> {code}
> (COALESCE(a, '')  = '') OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> into:
> {code}
> (a='') OR
>(a is null) OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> With complex queries, this will lead us to factor more predicates that could 
> be pushed to the TS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14431) Recognize COALESCE as CASE

2017-04-30 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990261#comment-15990261
 ] 

Remus Rusanu commented on HIVE-14431:
-

Since the original path from Jesus the simplify rules have moved into Calcite 
(RexSimplify). We have to follow up as a Calcite issue, and see if they 
desire/accept this rule.
An alternative is to do something similar to the 
{{HivePointLookupOptimizerRule}}, ie. have a RelOptRule for Filter/Join 
operators and inspect the condition for COALESCE => CASE simplification. This 
would be entirely in Hive space.

> Recognize COALESCE as CASE
> --
>
> Key: HIVE-14431
> URL: https://issues.apache.org/jira/browse/HIVE-14431
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Remus Rusanu
> Attachments: HIVE-14431.01.patch, HIVE-14431.2.patch, HIVE-14431.patch
>
>
> Transform:
> {code}
> (COALESCE(a, '')  = '') OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> into:
> {code}
> (a='') OR
>(a is null) OR
>(a = 'A' AND b = c)  OR
>(a = 'B' AND b = d) OR
>(a = 'C' AND b = e) OR
>(a = 'D' AND b = f) OR
>(a = 'E' AND b = g) OR
>(a = 'F' AND b = h)
> {code}
> With complex queries, this will lead us to factor more predicates that could 
> be pushed to the TS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16527:

Attachment: HIVE-16527.03.patch

patch .03 added values file and non-explain selects to .q

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch, 
> HIVE-16527.03.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-26 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16527:

Status: Patch Available  (was: Open)

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-26 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16527:

Status: Open  (was: Patch Available)

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-26 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16527:

Attachment: HIVE-16527.02.patch

01. renamed 02 for Jenkins

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-26 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16527:

Attachment: (was: HIVE-16527.01.patch)

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16527.00.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-26 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16527:

Attachment: HIVE-16527.01.patch

.01.patch fixes the test failures, adds context to 
{{doPhase1GetAllAggregations}} so we know we're inside a windowed function.

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16527.00.patch, HIVE-16527.01.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-25 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16527:

Attachment: HIVE-16527.00.patch

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16527.00.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >