zhengruifeng commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON URL: https://github.com/apache/spark/pull/25776#issuecomment-532566063 @srowen Most models in Pyspark do not have any setter/getter (One exception is OneVsRest). And no model has prediction function. A main complaint about PySpark-ML I heard from the uers of JD's bigdate platform is that they can not set the input/output column name of models. It is inconvenient to rename some columns to avoid column conflicts. Suppose we deal with a classification task in a interactive mode(like jupyter). We have trained some classification models with default columns names, we evaluate them one by one, and then want to ensamble some good models. Now we must rename the `predictionCol` of some models after transformation, since all model have the same column name. Otherwise, we need to re-train them with modified column names. Similar cases are easy to happen when we deal with dataframe with tens of columns and try several algorithms. So we want the column setters like the scala side. The goal is to make the py side in sync with the scala side. It has two benefits: 1, it will be easy to maintain the codebase, when we change the scala side, it is easy to sync in the py side; 2, function parity, methods like models' getter are still missing in the py side. I try to devide the goal into serveral subtasks in [SPARK-28958](https://issues.apache.org/jira/browse/SPARK-28958), after this PR we need to resolve others.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
