[GitHub] spark pull request #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartit...

2018-08-13 Thread bersprockets
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/22079#discussion_r209736691 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -144,7 +144,7 @@ object ChiSqSelectorModel extends

[GitHub] spark pull request #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartit...

2018-08-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22079#discussion_r209731698 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -144,7 +144,7 @@ object ChiSqSelectorModel extends

[GitHub] spark pull request #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartit...

2018-08-13 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/22079#discussion_r209705612 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java --- @@ -0,0 +1,70 @@ +/* + * Licensed to the

[GitHub] spark pull request #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartit...

2018-08-11 Thread bersprockets
GitHub user bersprockets opened a pull request: https://github.com/apache/spark/pull/22079 [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on a DataFrame could lead to incorrect answers ## What changes were proposed in this pull request? Currently shuffle repartition