[jira] [Commented] (SPARK-21998) SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning

2017-09-19 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172169#comment-16172169
 ] 

Maryann Xue commented on SPARK-21998:
-

Thanks again for your comment, [~maropu]! I changed the title and description 
of this JIRA accordingly and created a PR as 
https://github.com/apache/spark/pull/19281. Could you please take a look?

> SortMergeJoinExec did not calculate its outputOrdering correctly during 
> physical planning
> -
>
> Key: SPARK-21998
> URL: https://issues.apache.org/jira/browse/SPARK-21998
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Maryann Xue
>Priority: Minor
>
> Right now the calculation of SortMergeJoinExec's outputOrdering relies on the 
> fact that its children have already been sorted on the join keys, while this 
> is often not true until EnsureRequirements has been applied.
> {code}
>   /**
>* For SMJ, child's output must have been sorted on key or expressions with 
> the same order as
>* key, so we can get ordering for key from child's output ordering.
>*/
>   private def getKeyOrdering(keys: Seq[Expression], childOutputOrdering: 
> Seq[SortOrder])
> : Seq[SortOrder] = {
> keys.zip(childOutputOrdering).map { case (key, childOrder) =>
>   SortOrder(key, Ascending, childOrder.sameOrderExpressions + 
> childOrder.child - key)
> }
>   }
> {code}
> Thus SortMergeJoinExec's outputOrdering is most likely not correct during the 
> physical planning stage, and as a result, potential physical optimizations 
> that rely on the required/output orderings, like SPARK-18591, will not work 
> for SortMergeJoinExec.
> The right behavior of {{getKeyOrdering(keys, childOutputOrdering)}} should be:
> 1. If the childOutputOrdering satisfies (is a superset of) the required child 
> ordering => childOutputOrdering
> 2. Otherwise => required child ordering



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21998) SortMergeJoinExec did not calculate its outputOrdering correctly during physical planning

2017-09-19 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172166#comment-16172166
 ] 

Apache Spark commented on SPARK-21998:
--

User 'maryannxue' has created a pull request for this issue:
https://github.com/apache/spark/pull/19281

> SortMergeJoinExec did not calculate its outputOrdering correctly during 
> physical planning
> -
>
> Key: SPARK-21998
> URL: https://issues.apache.org/jira/browse/SPARK-21998
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Maryann Xue
>Priority: Minor
>
> Right now the calculation of SortMergeJoinExec's outputOrdering relies on the 
> fact that its children have already been sorted on the join keys, while this 
> is often not true until EnsureRequirements has been applied.
> {code}
>   /**
>* For SMJ, child's output must have been sorted on key or expressions with 
> the same order as
>* key, so we can get ordering for key from child's output ordering.
>*/
>   private def getKeyOrdering(keys: Seq[Expression], childOutputOrdering: 
> Seq[SortOrder])
> : Seq[SortOrder] = {
> keys.zip(childOutputOrdering).map { case (key, childOrder) =>
>   SortOrder(key, Ascending, childOrder.sameOrderExpressions + 
> childOrder.child - key)
> }
>   }
> {code}
> Thus SortMergeJoinExec's outputOrdering is most likely not correct during the 
> physical planning stage, and as a result, potential physical optimizations 
> that rely on the required/output orderings, like SPARK-18591, will not work 
> for SortMergeJoinExec.
> The right behavior of {{getKeyOrdering(keys, childOutputOrdering)}} should be:
> 1. If the childOutputOrdering satisfies (is a superset of) the required child 
> ordering => childOutputOrdering
> 2. Otherwise => required child ordering



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org