zero323 commented on issue #25776: [SPARK-28985][PYTHON][ML] Add common classes 
(JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON
URL: https://github.com/apache/spark/pull/25776#issuecomment-534039351
 
 
   > As to your proposal, I personally think it is reasonable, although 
Pyspark.ML is
   > mostly there to wrap the Scala side. 
   
   Even if that was the original intention, I'd say  that ship has sailed.
   
   - First of all nothing in the original API indicated this. On the contrary, 
the original API clearly suggests that non-Java path is supported, by providing 
base classes (`Params`, 
[`Transformer`](https://github.com/apache/spark/blob/eb037a8180be4ab7570eda1fa9cbf3c84b92c3f7/python/pyspark/ml/base.py#L138),
 
[`Estimator`](https://github.com/apache/spark/blob/eb037a8180be4ab7570eda1fa9cbf3c84b92c3f7/python/pyspark/ml/base.py#L70),
 
[`Model`](https://github.com/apache/spark/blob/eb037a8180be4ab7570eda1fa9cbf3c84b92c3f7/python/pyspark/ml/base.py#L178),
 
[`ML{Reader,Writer}`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/util.py#L106),
 `ML{Readable,Writable}`) as well as Java specific implementations  
(`JavaParams`, 
[`JavaTransformer`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/wrapper.py#L319),
 
[`JavaEstimator`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/wrapper.py#L285),
 
[`JavaModel`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/wrapper.py#L405),
 
[`JavaML{Reader,Writer}`](https://github.com/apache/spark/blob/f725d472f51fb80c6ce1882ec283ff69bafb0de4/python/pyspark/ml/util.py#L161),
 `JavaML{Readable,Writable}`). 
   
   - Furthermore authoritative (IMHO) and open Python ML extensions exist 
([spark-sklearn](https://github.com/databricks/spark-sklearn) is one of these, 
but if I recall correctly 
[spark-deep-learning](https://github.com/databricks/spark-deep-learning) 
provides so pure-Python utilities). Personally I've seen quite a lot of private 
implementations, but that's just anecdotal evidence.
   
   Let us assume for the sake of argument that above observations are 
irrelevant. I will argue that having complete, public type hierarchy is still 
desired:
   
   - Every use case I described, can narrowed down to Java implementation only, 
though there are less compelling if we do that.
   - More importantly, public type hierarchy with Java specific extensions, is 
`pyspark.ml` standard. There is no reason to treat this specific case as an 
exception, especially when the implementations, is far from utilitarian (for 
example implementation-free `JavaClassifierParams` and 
`JavaProbabilisticClassifierParams` save, as for now, no practical purpose 
whatsoever).
   
   > This PR will not directly affect your own implementation
   
   That might be the case, though I think it is better to be cautious here.  
Multiple inheritance with ABCs and deep type hierarchy can be somewhat tricky, 
especially if you need both Python 2 and 3 compatibility.
   
   Additionally some subtypes can simply obsolete if a public hierarchy is 
provided ( aforementioned `JavaClassifierParams` and 
`JavaProbabilisticClassifierParams` are good examples).
   
   It means that proposed complete and public hierarchy might require some API 
changes, and can be harder to justify, in a minor release.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to