Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6300#issuecomment-119774036
@BryanCutler Sorry for the delay! I like the general idea, but I think it
could be simpler. What if:
* Predictor.transform still handled everything, except the actual
prediction. For that, it would call transformImpl(dataset). I'll note what I
mean inline.
* Predictor.transformImpl would by default use predict(), as before.
* Subclasses like RandomForestClassifier could override transformImpl to
broadcast the model and then use that broadcast variable in a map (which would
use predict()).
That should allow you to do the same thing, but you can have subclasses not
override transform() and can eliminate predictImpl. (Also, currently, the
subclasses skip schema validation in transform, which is a problem.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]