zero323 commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON URL: https://github.com/apache/spark/pull/25776#issuecomment-534039351 > As to your proposal, I personally think it is reasonable, although Pyspark.ML is > mostly there to wrap the Scala side. Even if that was the original intention, I'd say that ship has sailed. - First of all nothing in the original API indicated this. On the contrary, the original API clearly suggests that non-Java path is supported, by providing base classes (`Params`, [`Transformer`](https://github.com/apache/spark/blob/eb037a8180be4ab7570eda1fa9cbf3c84b92c3f7/python/pyspark/ml/base.py#L138), [`Estimator`](https://github.com/apache/spark/blob/eb037a8180be4ab7570eda1fa9cbf3c84b92c3f7/python/pyspark/ml/base.py#L70), [`Model`](https://github.com/apache/spark/blob/eb037a8180be4ab7570eda1fa9cbf3c84b92c3f7/python/pyspark/ml/base.py#L178), [`ML{Reader,Writer}`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/util.py#L106), `ML{Readable,Writable}`) as well as Java specific implementations (`JavaParams`, [`JavaTransformer`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/wrapper.py#L319), [`JavaEstimator`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/wrapper.py#L285), [`JavaModel`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/wrapper.py#L405), [`JavaML{Reader,Writer}`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/util.py#L161), `JavaML{Readable,Writable}`). - Furthermore authoritative (IMHO) and open Python ML extensions exist ([spark-sklearn](https://github.com/databricks/spark-sklearn) is one of these, but if I recall correctly [spark-deep-learning](https://github.com/databricks/spark-deep-learning) provides so pure-Python utilities). Personally I've seen quite a lot of private implementations, but that's just anecdotal evidence. Let us assume for the sake of argument that above observations are irrelevant. I will argue that having complete, public type hierarchy is still desired: - Every use case I described, can narrowed down to Java implementation only, though there are less compelling if we do that. - More importantly, public type hierarchy with Java specific extensions, is `pyspark.ml` standard. There is no reason to treat this specific case as an exception, especially when the implementations, is far from utilitarian (for example implementation-free `JavaClassifierParams` and `JavaProbabilisticClassifierParams` save, as for now, no practical purpose whatsoever). > This PR will not directly affect your own implementation That might be the case, though I think it is better to be cautious here. Multiple inheritance with ABCs and deep type hierarchy can be somewhat tricky, especially if you need both Python 2 and 3 compatibility. Additionally some subtypes can simply obsolete if a public hierarchy is provided ( aforementioned `JavaClassifierParams` and `JavaProbabilisticClassifierParams` are good examples). It means that proposed complete and public hierarchy might require some API changes, and can be harder to justify, in a minor release.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
