[jira] [Commented] (SPARK-23375) Optimizer should remove unneeded Sort

2019-03-18 Thread Xiaoju Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794805#comment-16794805
 ] 

Xiaoju Wu commented on SPARK-23375:
---

But one of your test cases is conflict with what I talked about above:

test("sort should not be removed when there is a node which doesn't guarantee 
any order") {
  val orderedPlan = testRelation.select('a, 'b).orderBy('a.asc)
  val groupedAndResorted = orderedPlan.groupBy('a)(sum('a)).orderBy('a.asc)
  val optimized = Optimize.execute(groupedAndResorted.analyze)
  val correctAnswer = groupedAndResorted.analyze
  comparePlans(optimized, correctAnswer)
}

Why you design like this? In my opinion, since Aggregate won't pass up the 
ordering, the below Sort is useless.

 

> Optimizer should remove unneeded Sort
> -
>
> Key: SPARK-23375
> URL: https://issues.apache.org/jira/browse/SPARK-23375
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> As pointed out in SPARK-23368, as of now there is no rule to remove the Sort 
> operator on an already sorted plan, ie. if we have a query like:
> {code}
> SELECT b
> FROM (
> SELECT a, b
> FROM table1
> ORDER BY a
> ) t
> ORDER BY a
> {code}
> The sort is actually executed twice, even though it is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23375) Optimizer should remove unneeded Sort

2019-03-18 Thread Xiaoju Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794791#comment-16794791
 ] 

Xiaoju Wu commented on SPARK-23375:
---

I think there's another case in which sort is redundant:

Sort just under an non-orderPreserving node is redundant, for example:

select count(*) from (select a1 from A order by a2);
+- Aggregate
  +- Sort
     +- FileScan parquet

> Optimizer should remove unneeded Sort
> -
>
> Key: SPARK-23375
> URL: https://issues.apache.org/jira/browse/SPARK-23375
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> As pointed out in SPARK-23368, as of now there is no rule to remove the Sort 
> operator on an already sorted plan, ie. if we have a query like:
> {code}
> SELECT b
> FROM (
> SELECT a, b
> FROM table1
> ORDER BY a
> ) t
> ORDER BY a
> {code}
> The sort is actually executed twice, even though it is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23375) Optimizer should remove unneeded Sort

2018-12-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713516#comment-16713516
 ] 

Apache Spark commented on SPARK-23375:
--

User 'seancxmao' has created a pull request for this issue:
https://github.com/apache/spark/pull/23258

> Optimizer should remove unneeded Sort
> -
>
> Key: SPARK-23375
> URL: https://issues.apache.org/jira/browse/SPARK-23375
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> As pointed out in SPARK-23368, as of now there is no rule to remove the Sort 
> operator on an already sorted plan, ie. if we have a query like:
> {code}
> SELECT b
> FROM (
> SELECT a, b
> FROM table1
> ORDER BY a
> ) t
> ORDER BY a
> {code}
> The sort is actually executed twice, even though it is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23375) Optimizer should remove unneeded Sort

2018-12-07 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713514#comment-16713514
 ] 

Apache Spark commented on SPARK-23375:
--

User 'seancxmao' has created a pull request for this issue:
https://github.com/apache/spark/pull/23258

> Optimizer should remove unneeded Sort
> -
>
> Key: SPARK-23375
> URL: https://issues.apache.org/jira/browse/SPARK-23375
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> As pointed out in SPARK-23368, as of now there is no rule to remove the Sort 
> operator on an already sorted plan, ie. if we have a query like:
> {code}
> SELECT b
> FROM (
> SELECT a, b
> FROM table1
> ORDER BY a
> ) t
> ORDER BY a
> {code}
> The sort is actually executed twice, even though it is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23375) Optimizer should remove unneeded Sort

2018-10-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649077#comment-16649077
 ] 

Apache Spark commented on SPARK-23375:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/22715

> Optimizer should remove unneeded Sort
> -
>
> Key: SPARK-23375
> URL: https://issues.apache.org/jira/browse/SPARK-23375
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> As pointed out in SPARK-23368, as of now there is no rule to remove the Sort 
> operator on an already sorted plan, ie. if we have a query like:
> {code}
> SELECT b
> FROM (
> SELECT a, b
> FROM table1
> ORDER BY a
> ) t
> ORDER BY a
> {code}
> The sort is actually executed twice, even though it is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23375) Optimizer should remove unneeded Sort

2018-10-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649076#comment-16649076
 ] 

Apache Spark commented on SPARK-23375:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/22715

> Optimizer should remove unneeded Sort
> -
>
> Key: SPARK-23375
> URL: https://issues.apache.org/jira/browse/SPARK-23375
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> As pointed out in SPARK-23368, as of now there is no rule to remove the Sort 
> operator on an already sorted plan, ie. if we have a query like:
> {code}
> SELECT b
> FROM (
> SELECT a, b
> FROM table1
> ORDER BY a
> ) t
> ORDER BY a
> {code}
> The sort is actually executed twice, even though it is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23375) Optimizer should remove unneeded Sort

2018-02-09 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358435#comment-16358435
 ] 

Apache Spark commented on SPARK-23375:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/20560

> Optimizer should remove unneeded Sort
> -
>
> Key: SPARK-23375
> URL: https://issues.apache.org/jira/browse/SPARK-23375
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Marco Gaido
>Priority: Minor
>
> As pointed out in SPARK-23368, as of now there is no rule to remove the Sort 
> operator on an already sorted plan, ie. if we have a query like:
> {code}
> SELECT b
> FROM (
> SELECT a, b
> FROM table1
> ORDER BY a
> ) t
> ORDER BY a
> {code}
> The sort is actually executed twice, even though it is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org