Hi YanBo Thank you very much. You are totally correct! I just looked up spark document of 2.0.1. It says that "Maximum memory in MB allocated to histogram aggregation. If too small, then 1 node will be split per iteration, and its aggregates may exceed this size. (default = 256 MB)”
Although this setting isn't altered in spark 2.0, it didn’t occur with my ml source code in spark 1.6.1. It seems that implementation of random forest algorithm in spark 2.0 occupied more memory and altered the threshold to trigger this warning in spite of no change of the default value to maxMemoryInMB 发件人: Yanbo Liang <yblia...@gmail.com<mailto:yblia...@gmail.com>> 日期: 2016年10月18日 星期二 上午11:55 至: zhangjianxin <zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>> 抄送: Xi Shen <davidshe...@gmail.com<mailto:davidshe...@gmail.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> 主题: Re: Did anybody come across this random-forest issue with spark 2.0.1. Please increase the value of "maxMemoryInMB" of your RandomForestClassifier or RandomForestRegressor. It's a warning which will not affect the result but may lead your training slower. Thanks Yanbo On Mon, Oct 17, 2016 at 8:21 PM, 张建鑫(市场部) <zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>> wrote: Hi Xi Shen The warning message wasn’t removed after I had upgraded my java to V8, but anyway I appreciate your kind help. Since it’s just a WARN, I suppose I can bear with it and nothing bad would really happen. Am I right? 6/10/18 11:12:42 WARN RandomForest: Tree learning is using approximately 268437864 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 80088 nodes in this iteration. 16/10/18 11:13:07 WARN RandomForest: Tree learning is using approximately 268436304 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 80132 nodes in this iteration. 16/10/18 11:13:32 WARN RandomForest: Tree learning is using approximately 268437816 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 80082 nodes in this iteration. 发件人: zhangjianxin <zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>> 日期: 2016年10月17日 星期一 下午8:16 至: Xi Shen <davidshe...@gmail.com<mailto:davidshe...@gmail.com>> 抄送: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> 主题: Re: Did anybody come across this random-forest issue with spark 2.0.1. Hi Xi Shen Not yet. For the moment my idk for spark is still V7. Thanks for your reminding, I will try it out by upgrading java. 发件人: Xi Shen <davidshe...@gmail.com<mailto:davidshe...@gmail.com>> 日期: 2016年10月17日 星期一 下午8:00 至: zhangjianxin <zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> 主题: Re: Did anybody come across this random-forest issue with spark 2.0.1. Did you also upgrade to Java from v7 to v8? On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) <zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>> wrote: Did anybody encounter this problem before and why it happens , how to solve it? The same training data and same source code work in 1.6.1, however become lousy in 2.0.1 [X] -- Thanks, David S.