[GitHub] spark pull request: [SPARK-7131] [ml] Copy Decision Tree, Random F...

jkbradley Thu, 16 Jul 2015 18:25:41 -0700

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/7294#issuecomment-122143399
  
    @manishamde Thanks for taking a look!  I do want to run some performance 
tests to make sure they are similar.  For accuracy, current tests run all unit 
tests and make sure ml and mllib output exactly the same models.  I hope that's 
sufficient (but I do think we should continue to improve unit tests).
    
    @mengxr Thanks for reviewing!  If I had felt at liberty to do more of these 
cleanups, I would have.  : )  I'm going ahead and fixing the cleanups, save for 
unpersisting the last prevNodeIdsForInstances.
    
    W.r.t making the changes discoverable, I agree it's a hard problem.  I 
think removing bins was worth the trouble to avoid creating even more work for 
ourselves (new tree representation with bins, using bins internally, removing 
it eventually).  I was relying on unit tests to give confidence.  It'd be good 
to come up with ideas for the next time this happens.
    
    Let me know if there is a problem with storing the FileSystem instance.  (I 
assume it's fine but don't know much about HDFS...)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7131] [ml] Copy Decision Tree, Random F...

Reply via email to