[jira] [Commented] (SPARK-20451) Filter out nested mapType datatypes from sort order in randomSplit

Apache Spark (JIRA) Mon, 24 Apr 2017 14:11:29 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981926#comment-15981926
 ]


Apache Spark commented on SPARK-20451:
--------------------------------------

User 'sameeragarwal' has created a pull request for this issue:
https://github.com/apache/spark/pull/17751

> Filter out nested mapType datatypes from sort order in randomSplit
> ------------------------------------------------------------------
>
>                 Key: SPARK-20451
>                 URL: https://issues.apache.org/jira/browse/SPARK-20451
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.1.0, 2.2.0
>            Reporter: Sameer Agarwal
>
> In {{randomSplit}}, It is possible that the underlying dataframe doesn't 
> guarantee the ordering of rows in its constituent partitions each time a 
> split is materialized which could result in overlapping splits.
> To prevent this, we explicitly sort each input partition to make the ordering 
> deterministic. Given that MapTypes cannot be sorted they should be explicitly 
> pruned out from the sort order. Additionally, if the resulting sort order is 
> empty, we then materialize the dataset to guarantee determinism.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20451) Filter out nested mapType datatypes from sort order in randomSplit

Reply via email to