subject:"\[jira\] \[Commented\] \(HIVE\-4952\) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results"

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-08-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13728042#comment-13728042
 ] 

Hudson commented on HIVE-4952:
--

ABORTED: Integrated in Hive-trunk-hadoop2 #322 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/322/])
HIVE-4952 : When hive.join.emit.interval is small, queries optimized by 
Correlation Optimizer may generate wrong results (Yin Huai via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1509542)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
* /hive/trunk/ql/src/test/queries/clientpositive/correlationoptimizer15.q
* /hive/trunk/ql/src/test/results/clientpositive/correlationoptimizer15.q.out


> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, 
> replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-08-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727696#comment-13727696
 ] 

Hudson commented on HIVE-4952:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2239 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2239/])
HIVE-4952 : When hive.join.emit.interval is small, queries optimized by 
Correlation Optimizer may generate wrong results (Yin Huai via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1509542)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
* /hive/trunk/ql/src/test/queries/clientpositive/correlationoptimizer15.q
* /hive/trunk/ql/src/test/results/clientpositive/correlationoptimizer15.q.out


> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, 
> replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-08-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727541#comment-13727541
 ] 

Hudson commented on HIVE-4952:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #113 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/113/])
HIVE-4952 : When hive.join.emit.interval is small, queries optimized by 
Correlation Optimizer may generate wrong results (Yin Huai via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1509542)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
* /hive/trunk/ql/src/test/queries/clientpositive/correlationoptimizer15.q
* /hive/trunk/ql/src/test/results/clientpositive/correlationoptimizer15.q.out


> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, 
> replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-08-02 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727502#comment-13727502
 ] 

Hudson commented on HIVE-4952:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #41 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/41/])
HIVE-4952 : When hive.join.emit.interval is small, queries optimized by 
Correlation Optimizer may generate wrong results (Yin Huai via Ashutosh 
Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1509542)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
* /hive/trunk/ql/src/test/queries/clientpositive/correlationoptimizer15.q
* /hive/trunk/ql/src/test/results/clientpositive/correlationoptimizer15.q.out


> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, 
> replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-08-01 Thread Phabricator (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13726488#comment-13726488
 ] 

Phabricator commented on HIVE-4952:
---

ashutoshc has accepted the revision "HIVE-4952 [jira] When 
hive.join.emit.interval is small, queries optimized by Correlation Optimizer 
may generate wrong results".

  +1

REVISION DETAIL
  https://reviews.facebook.net/D11889

BRANCH
  HIVE-4952

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, yhuai


> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, 
> replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-31 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725513#comment-13725513
 ] 

Hive QA commented on HIVE-4952:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12595206/HIVE-4952.D11889.2.patch

{color:green}SUCCESS:{color} +1 2749 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/259/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/259/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Attachments: HIVE-4952.D11889.1.patch, HIVE-4952.D11889.2.patch, 
> replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-30 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724317#comment-13724317
 ] 

Yin Huai commented on HIVE-4952:


I meant the failed test was caused by HIVE-2906. It has been fixed by HIVE-4955.

> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Attachments: HIVE-4952.D11889.1.patch, replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-30 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724316#comment-13724316
 ] 

Yin Huai commented on HIVE-4952:


seems the failed test is caused by HIVE-4955

> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Attachments: HIVE-4952.D11889.1.patch, replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-30 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723726#comment-13723726
 ] 

Hive QA commented on HIVE-4952:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594874/HIVE-4952.D11889.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2737 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/236/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/236/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Attachments: HIVE-4952.D11889.1.patch, replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-29 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723407#comment-13723407
 ] 

Yin Huai commented on HIVE-4952:


To fix this bug, Demux will be modified to be aware that rows associated with a 
key are ordered by the tag. When Demux see a row with new tag coming, it will 
know that rows with tags which are less than this incoming tag can be processed.

Taking the example in the description, with this fix, inputs of JOIN2 will be 
ordered by the tag. When Demux sees a tag with 1, it will ask GBY to process 
its buffer, and then GBY will ask JOIN1 to process its buffer. Before Demux 
forwards a new row with the tag of 1 to JOIN2, all rows with the tag of 0 will 
be forwarded into JOIN2.

> When hive.join.emit.interval is small, queries optimized by Correlation 
> Optimizer may generate wrong results
> 
>
> Key: HIVE-4952
> URL: https://issues.apache.org/jira/browse/HIVE-4952
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Yin Huai
>Assignee: Yin Huai
> Attachments: replay.txt
>
>
> If we have a query like this ...
> {code:sql}
> SELECT xx.key, xx.cnt, yy.key
> FROM
> (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
> y.key) group by x.key) xx
> JOIN src yy
> ON xx.key=yy.key;
> {\code}
> After Correlation Optimizer, the operator tree in the reducer will be 
> {code}
>  JOIN2
>|
>|
>   MUX
>  /   \
> / \
>GBY |
> |  |
>   JOIN1|
> \ /
>  \   /
>  DEMUX
> {\code}
> For JOIN2, the right table will arrive at this operator first. If 
> hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
> it has not got any row from the left table. The logic related 
> hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
> by the tag. But, if a query has been optimized by Correlation Optimizer, this 
> assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

10 matches

Site Navigation

Mail list logo

Footer information