[jira] [Created] (SPARK-20451) Filter out nested mapType datatypes from sort order in randomSplit

Sameer Agarwal (JIRA) Mon, 24 Apr 2017 14:06:30 -0700

Sameer Agarwal created SPARK-20451:
--------------------------------------

             Summary: Filter out nested mapType datatypes from sort order in 
randomSplit
                 Key: SPARK-20451
                 URL: https://issues.apache.org/jira/browse/SPARK-20451
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0, 2.0.0, 2.2.0
            Reporter: Sameer Agarwal



In {{randomSplit}}, It is possible that the underlying dataframe doesn't 
guarantee the ordering of rows in its constituent partitions each time a split 
is materialized which could result in overlapping splits.

To prevent this, we explicitly sort each input partition to make the ordering 
deterministic. Given that MapTypes cannot be sorted they should be explicitly 
pruned out from the sort order. Additionally, if the resulting sort order is 
empty, we then materialize the dataset to guarantee determinism.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20451) Filter out nested mapType datatypes from sort order in randomSplit

Reply via email to