Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22079#discussion_r209731698
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -144,7 +144,7 @@ object ChiSqSelectorModel extends
Loader[ChiSqSelectorModel] {
val dataArray = Array.tabulate(model.selectedFeatures.length) { i =>
Data(model.selectedFeatures(i))
}
-
spark.createDataFrame(dataArray).repartition(1).write.parquet(Loader.dataPath(path))
+ spark.createDataFrame(sc.makeRDD(dataArray,
1)).write.parquet(Loader.dataPath(path))
--- End diff --
@jiangxb1987 and @bersprockets . SPARK-22905 consists of two commits.
- ChiSqSelector (https://github.com/apache/spark/pull/20088)
- GaussianMixtureModel (https://github.com/apache/spark/pull/20113)
If we want to include SPARK-22905 here, it had better be explicit and
complete by putting `[SPARK-22905]` into the PR title and includes both patches.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]