[GitHub] spark pull request: [SPARK-3272][MLLib]Calculate prediction for no...

jkbradley Thu, 28 Aug 2014 09:13:52 -0700

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2180#issuecomment-53746339
  
    @chouqin Thanks for observing that we can sometimes avoid calculating the 
prediction and/or the info gain.  I'm worried that this won't really change the 
scaling of the algorithm much since calculating the prediction is a low-cost 
operation.  (This computation is done on the master node, and for any 
reasonable size dataset, the time spent on the master node is negligible 
compared to the time spent on the treeAggregate() call.)
    
    I'm also worried about this PR clashing with the current DecisionTree PR: 
[https://github.com/apache/spark/pull/2125], which moves the calculation of 
predictions into separate Impurity* classes.  Would it be possible to update 
this once [https://github.com/apache/spark/pull/2125] has gone through?
    
    At that time, I think this PR could be simplified a bit by removing the 
Predict class.  InformationGainStats.predict already holds the prediction, and 
InformationGainStats.gain can be computed or ignored as needed.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3272][MLLib]Calculate prediction for no...

Reply via email to