[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

bersprockets Sat, 11 Aug 2018 14:23:07 -0700

Github user bersprockets commented on the issue:

    https://github.com/apache/spark/pull/22079
  
    The test "model load / save" in ChiSqSelectorSuite fails because of this 
line in 
    
[ChiSqSelector.scala](https://github.com/apache/spark/blob/branch-2.2/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala#L147)
    
    <pre>
    
spark.createDataFrame(dataArray).repartition(1).write.parquet(Loader.dataPath(path))
    </pre>
    
    In 2.4, the line is:
    
    <pre>
    spark.createDataFrame(sc.makeRDD(dataArray, 
1)).write.parquet(Loader.dataPath(path))
    </pre>
    
    If you change 2.4 to also have that line, and also remove the follow-up PR 
(#20426) to avoid sorting when there is one partition, this test also fails on 
2.4 in the same way.
    
    So I am not sure which way to go: Update ChiSqSelector.scala to be like 2.4 
(simply a one line change), or make the test accept this new order.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22079: [SPARK-23207][SQL][BACKPORT-2.2] Shuffle+Repartition on ...

Reply via email to