[jira] [Commented] (HIVE-15848) count or sum distinct incorrect when hive.optimize.reducededuplication set to true

2017-02-27 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887340#comment-15887340
 ] 

Ashutosh Chauhan commented on HIVE-15848:
-

It will be ideal to migrate these optimizations to calcite tree by writing an 
calcite rule where implementing these optimizations are more straight forward. 
In the meanwhile its alright to turn off these optimization in this particular 
case. 
+1

> count or sum distinct incorrect when hive.optimize.reducededuplication set to 
> true
> --
>
> Key: HIVE-15848
> URL: https://issues.apache.org/jira/browse/HIVE-15848
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Biao Wu
>Assignee: Zoltan Haindrich
>Priority: Critical
> Attachments: HIVE-15848.1.patch, HIVE-15848.2.patch
>
>
> Test Table:
> {code:sql}
> create table test(id int,key int,name int);
> {code}
> Data:
> ||id||key||name||
> |1|1  |2
> |1|2  |3
> |1|3  |2
> |1|4  |2
> |1|5  |3
> Test SQL1:
> {code:sql}
> select id,count(Distinct key),count(Distinct name)
> from (select id,key,name from count_distinct_test group by id,key,name)m
> group by id;
> {code}
> result:
> |1|5|4
> expect:
> |1|5|2
> Test SQL2:
> {code:sql}
> select id,count(Distinct name),count(Distinct key)
> from (select id,key,name from count_distinct_test group by id,name,key)m
> group by id;
> {code}
> result:
> |1|2|5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15848) count or sum distinct incorrect when hive.optimize.reducededuplication set to true

2017-02-27 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886704#comment-15886704
 ] 

Hive QA commented on HIVE-15848:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12854961/HIVE-15848.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10274 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3819/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3819/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3819/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12854961 - PreCommit-HIVE-Build

> count or sum distinct incorrect when hive.optimize.reducededuplication set to 
> true
> --
>
> Key: HIVE-15848
> URL: https://issues.apache.org/jira/browse/HIVE-15848
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Biao Wu
>Assignee: Zoltan Haindrich
>Priority: Critical
> Attachments: HIVE-15848.1.patch, HIVE-15848.2.patch
>
>
> Test Table:
> {code:sql}
> create table test(id int,key int,name int);
> {code}
> Data:
> ||id||key||name||
> |1|1  |2
> |1|2  |3
> |1|3  |2
> |1|4  |2
> |1|5  |3
> Test SQL1:
> {code:sql}
> select id,count(Distinct key),count(Distinct name)
> from (select id,key,name from count_distinct_test group by id,key,name)m
> group by id;
> {code}
> result:
> |1|5|4
> expect:
> |1|5|2
> Test SQL2:
> {code:sql}
> select id,count(Distinct name),count(Distinct key)
> from (select id,key,name from count_distinct_test group by id,name,key)m
> group by id;
> {code}
> result:
> |1|2|5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15848) count or sum distinct incorrect when hive.optimize.reducededuplication set to true

2017-02-27 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886454#comment-15886454
 ] 

Zoltan Haindrich commented on HIVE-15848:
-

[~bill] seems like you've came to a similar conclusion :)
I've adjusted the patch. 

> count or sum distinct incorrect when hive.optimize.reducededuplication set to 
> true
> --
>
> Key: HIVE-15848
> URL: https://issues.apache.org/jira/browse/HIVE-15848
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Biao Wu
>Assignee: Zoltan Haindrich
>Priority: Critical
> Attachments: HIVE-15848.1.patch, HIVE-15848.2.patch
>
>
> Test Table:
> {code:sql}
> create table test(id int,key int,name int);
> {code}
> Data:
> ||id||key||name||
> |1|1  |2
> |1|2  |3
> |1|3  |2
> |1|4  |2
> |1|5  |3
> Test SQL1:
> {code:sql}
> select id,count(Distinct key),count(Distinct name)
> from (select id,key,name from count_distinct_test group by id,key,name)m
> group by id;
> {code}
> result:
> |1|5|4
> expect:
> |1|5|2
> Test SQL2:
> {code:sql}
> select id,count(Distinct name),count(Distinct key)
> from (select id,key,name from count_distinct_test group by id,name,key)m
> group by id;
> {code}
> result:
> |1|2|5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15848) count or sum distinct incorrect when hive.optimize.reducededuplication set to true

2017-02-26 Thread Biao Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885037#comment-15885037
 ] 

Biao Wu commented on HIVE-15848:


Thanks  [~kgyrtkirk], I think  childDistinctColumnIndices should less than 2, 
the optimization is  enabled when childDistinctColumnIndices only have one key.
PR:  
https://github.com/apache/hive/pull/150/commits/a4fc3af4c77beafe11e3e4188571177862d64e4e


> count or sum distinct incorrect when hive.optimize.reducededuplication set to 
> true
> --
>
> Key: HIVE-15848
> URL: https://issues.apache.org/jira/browse/HIVE-15848
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Biao Wu
>Assignee: Zoltan Haindrich
>Priority: Critical
> Attachments: HIVE-15848.1.patch
>
>
> Test Table:
> {code:sql}
> create table test(id int,key int,name int);
> {code}
> Data:
> ||id||key||name||
> |1|1  |2
> |1|2  |3
> |1|3  |2
> |1|4  |2
> |1|5  |3
> Test SQL1:
> {code:sql}
> select id,count(Distinct key),count(Distinct name)
> from (select id,key,name from count_distinct_test group by id,key,name)m
> group by id;
> {code}
> result:
> |1|5|4
> expect:
> |1|5|2
> Test SQL2:
> {code:sql}
> select id,count(Distinct name),count(Distinct key)
> from (select id,key,name from count_distinct_test group by id,name,key)m
> group by id;
> {code}
> result:
> |1|2|5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15848) count or sum distinct incorrect when hive.optimize.reducededuplication set to true

2017-02-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884357#comment-15884357
 ] 

Hive QA commented on HIVE-15848:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12854665/HIVE-15848.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10256 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_gby3] 
(batchId=69)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[explainuser_2] 
(batchId=137)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=136)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[tez_union_multiinsert]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=94)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[multi_insert_gby3] 
(batchId=127)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=230)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3789/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3789/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3789/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12854665 - PreCommit-HIVE-Build

> count or sum distinct incorrect when hive.optimize.reducededuplication set to 
> true
> --
>
> Key: HIVE-15848
> URL: https://issues.apache.org/jira/browse/HIVE-15848
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Biao Wu
>Assignee: Zoltan Haindrich
>Priority: Critical
> Attachments: HIVE-15848.1.patch
>
>
> Test Table:
> {code:sql}
> create table test(id int,key int,name int);
> {code}
> Data:
> ||id||key||name||
> |1|1  |2
> |1|2  |3
> |1|3  |2
> |1|4  |2
> |1|5  |3
> Test SQL1:
> {code:sql}
> select id,count(Distinct key),count(Distinct name)
> from (select id,key,name from count_distinct_test group by id,key,name)m
> group by id;
> {code}
> result:
> |1|5|4
> expect:
> |1|5|2
> Test SQL2:
> {code:sql}
> select id,count(Distinct name),count(Distinct key)
> from (select id,key,name from count_distinct_test group by id,name,key)m
> group by id;
> {code}
> result:
> |1|2|5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15848) count or sum distinct incorrect when hive.optimize.reducededuplication set to true

2017-02-23 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882170#comment-15882170
 ] 

Zoltan Haindrich commented on HIVE-15848:
-

this bug is present on the current master branch

> count or sum distinct incorrect when hive.optimize.reducededuplication set to 
> true
> --
>
> Key: HIVE-15848
> URL: https://issues.apache.org/jira/browse/HIVE-15848
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Biao Wu
>Priority: Critical
>
> Test Table:
> {code:sql}
> create table test(id int,key int,name int);
> {code}
> Data:
> ||id||key||name||
> |1|1  |2
> |1|2  |3
> |1|3  |2
> |1|4  |2
> |1|5  |3
> Test SQL1:
> {code:sql}
> select id,count(Distinct key),count(Distinct name)
> from (select id,key,name from count_distinct_test group by id,key,name)m
> group by id;
> {code}
> result:
> |1|5|4
> expect:
> |1|5|2
> Test SQL2:
> {code:sql}
> select id,count(Distinct name),count(Distinct key)
> from (select id,key,name from count_distinct_test group by id,name,key)m
> group by id;
> {code}
> result:
> |1|2|5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)