[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

codedeft Wed, 29 Oct 2014 17:06:48 -0700

Github user codedeft commented on the pull request:

    https://github.com/apache/spark/pull/2868#issuecomment-61026533
  
    I've been doing some larger dataset (8 million rows with 784 features) 
testing on node Id cache and I don't think that node Id cache will do much for 
shallow trees. I'm trying to see where the 'sweet spot' is, but it may have to 
be well beyond depth 10 for node Id cache to be useful.
    
    Anyhow, it's taking an extremely long time to train to begin with for these 
big trees with only around 20 executors. I actually gave up to train upto 30 
depth level because it was taking upward of 8+ hours to train 100 trees. So the 
local sub-tree training is really essential here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

Reply via email to