[ https://issues.apache.org/jira/browse/SPARK-20451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981926#comment-15981926 ]
Apache Spark commented on SPARK-20451: -------------------------------------- User 'sameeragarwal' has created a pull request for this issue: https://github.com/apache/spark/pull/17751 > Filter out nested mapType datatypes from sort order in randomSplit > ------------------------------------------------------------------ > > Key: SPARK-20451 > URL: https://issues.apache.org/jira/browse/SPARK-20451 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 2.1.0, 2.2.0 > Reporter: Sameer Agarwal > > In {{randomSplit}}, It is possible that the underlying dataframe doesn't > guarantee the ordering of rows in its constituent partitions each time a > split is materialized which could result in overlapping splits. > To prevent this, we explicitly sort each input partition to make the ordering > deterministic. Given that MapTypes cannot be sorted they should be explicitly > pruned out from the sort order. Additionally, if the resulting sort order is > empty, we then materialize the dataset to guarantee determinism. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org