Eyal Farago created SPARK-28304:
-----------------------------------

             Summary: FileFormatWriter introduces an uncoditional join, even 
when all attributes are constants
                 Key: SPARK-28304
                 URL: https://issues.apache.org/jira/browse/SPARK-28304
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.2
            Reporter: Eyal Farago


FileFormatWriter derives a required sort order based on the partition columns, 
bucketing columns and explicitly required ordering. However in some use cases 
Some (or even all) of these fields are constant, in these cases the sort can be 
skipped.

i.e. in my use-case, we add a GUUID column identifying a specific (incremental) 
load, this can be thought of as a batch id. Since we run one batch at a time, 
this column is always a constant which means there's no need to sort based on 
this column, since we don't use bucketing or require an explicit ordering the 
entire sort can be skipped for our case.

 

I suggest:
 # filter away constant columns from the required ordering calculated by 
FileFormatWriter 
 # generalizing this to any Sort operator in a spark plan.
 # introduce optimizer rules to remove constants from sort ordering, 
potentially eliminating the sort operator altogether.
 # modify EnsureRequirements to be aware of constant field when deciding 
whether to introduce a sort or not. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to