[GitHub] spark pull request: [SPARK-7127] [MLLIB] [WIP] Adding broadcast of...

jkbradley Wed, 08 Jul 2015 17:59:49 -0700

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/6300#issuecomment-119774036
  
    @BryanCutler Sorry for the delay!  I like the general idea, but I think it 
could be simpler.  What if:
    * Predictor.transform still handled everything, except the actual 
prediction.  For that, it would call transformImpl(dataset).  I'll note what I 
mean inline.
    * Predictor.transformImpl would by default use predict(), as before.
    * Subclasses like RandomForestClassifier could override transformImpl to 
broadcast the model and then use that broadcast variable in a map (which would 
use predict()).
    
    That should allow you to do the same thing, but you can have subclasses not 
override transform() and can eliminate predictImpl.  (Also, currently, the 
subclasses skip schema validation in transform, which is a problem.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7127] [MLLIB] [WIP] Adding broadcast of...

Reply via email to