Re: RandomForest - subsamplingRate parameter

2015-06-17 Thread Xiangrui Meng
Because we don't have random access to the record, sampling still need to go through the records sequentially. It does save some computation, which is perhaps noticeable only if you have data cached in memory. Different random seeds are used for trees. -Xiangrui On Wed, Jun 3, 2015 at 4:40 PM,

RandomForest - subsamplingRate parameter

2015-06-03 Thread Andrew Leverentz
When training a RandomForest model, the Strategy class (in mllib.tree.configuration) provides a subsamplingRate parameter. I was hoping to use this to cut down on processing time for large datasets (more than 2MM rows and 9K predictors), but I've found that the runtime stays approximately