Re: Reduce the memory usage if we do same first in GradientBoostedTrees if subsamplingRate< 1.0

2016-11-15 Thread WangJianfei
with predError.zip(input) ,we get RDD data, so we can just do a sample on predError or input, if so, we can't use zip(the elements number must be the same in each partition),thank you! -- View this message in context:

Re: Reduce the memory usage if we do same first in GradientBoostedTrees if subsamplingRate< 1.0

2016-11-15 Thread Joseph Bradley
Thanks for the suggestion. That would be faster, but less accurate in most cases. It's generally better to use a new random sample on each iteration, based on literature and results I've seen. Joseph On Fri, Nov 11, 2016 at 5:13 AM, WangJianfei < wangjianfe...@otcaix.iscas.ac.cn> wrote: > when