[ 
https://issues.apache.org/jira/browse/SPARK-19503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856931#comment-15856931
 ] 

Sean Owen commented on SPARK-19503:
-----------------------------------

Can you improve the JIRA? the title is uninformative. See 
http://spark.apache.org/contributing.html

> Dumb Execution Plan
> -------------------
>
>                 Key: SPARK-19503
>                 URL: https://issues.apache.org/jira/browse/SPARK-19503
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 2.1.0
>         Environment: Perhaps only a pyspark or databricks AWS issue
>            Reporter: R
>            Priority: Minor
>              Labels: execution, optimizer, plan, query
>
> df.sort(...).count()
> performs shuffle and sort and then count! This is wasteful as sort is not 
> required here and makes me wonder how smart the algebraic optimiser is 
> indeed! The data may be partitioned by known count (such as parquet files) 
> and we should not shuffle to just perform count.
> This may look trivial, but if optimiser fails to recognise this, I wonder 
> what else is it missing especially in more complex operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to