Github user codedeft commented on the pull request:
https://github.com/apache/spark/pull/2868#issuecomment-61189986
Ok, my performance test on the small mnist is still consistent (100 trees,
30 depth limit). I think that the big reason for this is that when it's
actually running in a cluster (as opposed to locally), we'll actually be
transferring the trees and the actual transfer time can get significant once
the model gets larger.
This time the times it took were (there was another heavy workload in the
cluster, so overall slower),
without node id cache : 30 mins 57 seconds
with node id cache and checkpointing every 10 iterations : 19 mins 10
seconds
This is shown in the time between 'collectAsMap' and 'mapPartitions'. Near
the end, we see entries like these without node id cache.
213 mapPartitions at DecisionTree.scala:618 +details 2014/10/30 16:19:56
6 s
212 collectAsMap at DecisionTree.scala:647 +details 2014/10/30 16:19:43
0.2 s
As you can see, although collectAsMap only took 0.2 seconds starting from
16:19:43, the mapPartitions doesn't start until 13 seconds later! So although
the actual mapPartitions process took only 6 seconds, the overall time it took
was 19 seconds.
Early on, the time inbetween is much smaller:
45 mapPartitions at DecisionTree.scala:618 +details 2014/10/30 15:56:09
5 s
44 collectAsMap at DecisionTree.scala:647 +details 2014/10/30 15:56:05
3 s
In contrast, with node Id cache there's very little time inbetween these
two steps either early in the process or later in the process, although in
general mapPartitions seems to take a little more time:
44 mapPartitions at DecisionTree.scala:600 +details 2014/10/30 16:28:36
6 s
43 collectAsMap at DecisionTree.scala:647 +details 2014/10/30 16:28:33
3 s
212 mapPartitions at DecisionTree.scala:600 +details 2014/10/30 16:41:49
7 s
211 collectAsMap at DecisionTree.scala:647 +details 2014/10/30 16:41:46
2 s
I guess that the reason we don't see too much improvement with larger
datasets is that mapPartitions take much longer time, and the additional time
it takes to transfer models become comparatively smaller in percentage.
I'm still running the 10 tree mnist8m 30 depth test, been running for 5+
hours.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]