Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/11266#issuecomment-194540106
My thoughts on the pros/cons of having Python's Pipeline be a wrapper for
the Java Pipeline:
Pros:
* Less code duplication. This would have an even higher impact for
CrossValidator, which is annoying to implement twice.
Cons:
* This will break the code of anyone who has written a PipelineStage from
Python. We have not supported this very explicitly so far, but I think it's
something we should support eventually.
I'd propose:
* Short-term (for 2.0): We do not make Pipeline into a Java wrapper. But
we implement save/load by transferring the stages to Java (as you did in this
PR).
* Long-term: We can eventually consider better ways to support Python users
who wish to write their own PipelineStages in Python. If useful for combining
code paths, we could also consider making Pipeline into a Java wrapper, as long
as we come up with a good way to have a Java wrapper for a PipelineStage
defined in Python (like a Python UDF).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]