Re: Did anybody come across this random-forest issue with spark 2.0.1.

2016-10-18 Thread 市场部
Hi YanBo
Thank you very much.
You are totally correct!

I just looked up spark document of 2.0.1.  It says that "Maximum memory in MB 
allocated to histogram aggregation. If too small, then 1 node will be split per 
iteration, and its aggregates may exceed this size. (default = 256 MB)”

Although this setting isn't altered in spark 2.0,  it didn’t  occur with my ml 
source code in spark 1.6.1.   It seems that implementation of random forest 
algorithm  in spark 2.0 occupied more memory and altered the threshold to 
trigger this warning in spite of no change of the default value to maxMemoryInMB



发件人: Yanbo Liang <yblia...@gmail.com<mailto:yblia...@gmail.com>>
日期: 2016年10月18日 星期二 上午11:55
至: zhangjianxin 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>>
抄送: Xi Shen <davidshe...@gmail.com<mailto:davidshe...@gmail.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

​Please increase the value of "maxMemoryInMB"​ of your RandomForestClassifier 
or RandomForestRegressor.
It's a warning which will not affect the result but may lead your training 
slower.

Thanks
Yanbo

On Mon, Oct 17, 2016 at 8:21 PM, 张建鑫(市场部) 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>> wrote:
Hi Xi Shen

The warning message wasn’t  removed after I had upgraded my java to V8,
but  anyway I appreciate your kind help.

Since it’s just a WARN, I suppose I can bear with it and nothing bad would 
really happen. Am I right?


6/10/18 11:12:42 WARN RandomForest: Tree learning is using approximately 
268437864 bytes per iteration, which exceeds requested limit 
maxMemoryUsage=268435456. This allows splitting 80088 nodes in this iteration.
16/10/18 11:13:07 WARN RandomForest: Tree learning is using approximately 
268436304 bytes per iteration, which exceeds requested limit 
maxMemoryUsage=268435456. This allows splitting 80132 nodes in this iteration.
16/10/18 11:13:32 WARN RandomForest: Tree learning is using approximately 
268437816 bytes per iteration, which exceeds requested limit 
maxMemoryUsage=268435456. This allows splitting 80082 nodes in this iteration.



发件人: zhangjianxin 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>>
日期: 2016年10月17日 星期一 下午8:16
至: Xi Shen <davidshe...@gmail.com<mailto:davidshe...@gmail.com>>
抄送: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Hi Xi Shen

Not yet.  For the moment my idk for spark is still V7. Thanks for your 
reminding, I will try it out by upgrading java.

发件人: Xi Shen <davidshe...@gmail.com<mailto:davidshe...@gmail.com>>
日期: 2016年10月17日 星期一 下午8:00
至: zhangjianxin 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Did you also upgrade to Java from v7 to v8?

On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>> wrote:

Did anybody encounter this problem before and why it happens , how to solve it? 
 The same training data and same source code work in 1.6.1, however become 
lousy in 2.0.1

[X]
--

Thanks,
David S.



Re: Did anybody come across this random-forest issue with spark 2.0.1.

2016-10-17 Thread Yanbo Liang
​Please increase the value of "maxMemoryInMB"​ of your
RandomForestClassifier or RandomForestRegressor.
It's a warning which will not affect the result but may lead your training
slower.

Thanks
Yanbo

On Mon, Oct 17, 2016 at 8:21 PM, 张建鑫(市场部) <zhangjian...@didichuxing.com>
wrote:

> Hi Xi Shen
>
> The warning message wasn’t  removed after I had upgraded my java to V8,
> but  anyway I appreciate your kind help.
>
> Since it’s just a WARN, I suppose I can bear with it and nothing bad would
> really happen. Am I right?
>
>
> 6/10/18 11:12:42 WARN RandomForest: Tree learning is using approximately
> 268437864 bytes per iteration, which exceeds requested limit
> maxMemoryUsage=268435456. This allows splitting 80088 nodes in this
> iteration.
> 16/10/18 11:13:07 WARN RandomForest: Tree learning is using approximately
> 268436304 bytes per iteration, which exceeds requested limit
> maxMemoryUsage=268435456. This allows splitting 80132 nodes in this
> iteration.
> 16/10/18 11:13:32 WARN RandomForest: Tree learning is using approximately
> 268437816 bytes per iteration, which exceeds requested limit
> maxMemoryUsage=268435456. This allows splitting 80082 nodes in this
> iteration.
>
>
>
> 发件人: zhangjianxin <zhangjian...@didichuxing.com>
> 日期: 2016年10月17日 星期一 下午8:16
> 至: Xi Shen <davidshe...@gmail.com>
> 抄送: "user@spark.apache.org" <user@spark.apache.org>
> 主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.
>
> Hi Xi Shen
>
> Not yet.  For the moment my idk for spark is still V7. Thanks for your
> reminding, I will try it out by upgrading java.
>
> 发件人: Xi Shen <davidshe...@gmail.com>
> 日期: 2016年10月17日 星期一 下午8:00
> 至: zhangjianxin <zhangjian...@didichuxing.com>, "user@spark.apache.org" <
> user@spark.apache.org>
> 主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.
>
> Did you also upgrade to Java from v7 to v8?
>
> On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) <zhangjian...@didichuxing.com>
> wrote:
>
>>
>> Did anybody encounter this problem before and why it happens , how to
>> solve it?  The same training data and same source code work in 1.6.1,
>> however become lousy in 2.0.1
>>
>> --
>
>
> Thanks,
> David S.
>


Re: Did anybody come across this random-forest issue with spark 2.0.1.

2016-10-17 Thread 市场部
Hi Xi Shen

The warning message wasn’t  removed after I had upgraded my java to V8,
but  anyway I appreciate your kind help.

Since it’s just a WARN, I suppose I can bear with it and nothing bad would 
really happen. Am I right?


6/10/18 11:12:42 WARN RandomForest: Tree learning is using approximately 
268437864 bytes per iteration, which exceeds requested limit 
maxMemoryUsage=268435456. This allows splitting 80088 nodes in this iteration.
16/10/18 11:13:07 WARN RandomForest: Tree learning is using approximately 
268436304 bytes per iteration, which exceeds requested limit 
maxMemoryUsage=268435456. This allows splitting 80132 nodes in this iteration.
16/10/18 11:13:32 WARN RandomForest: Tree learning is using approximately 
268437816 bytes per iteration, which exceeds requested limit 
maxMemoryUsage=268435456. This allows splitting 80082 nodes in this iteration.



发件人: zhangjianxin 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>>
日期: 2016年10月17日 星期一 下午8:16
至: Xi Shen <davidshe...@gmail.com<mailto:davidshe...@gmail.com>>
抄送: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Hi Xi Shen

Not yet.  For the moment my idk for spark is still V7. Thanks for your 
reminding, I will try it out by upgrading java.

发件人: Xi Shen <davidshe...@gmail.com<mailto:davidshe...@gmail.com>>
日期: 2016年10月17日 星期一 下午8:00
至: zhangjianxin 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Did you also upgrade to Java from v7 to v8?

On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>> wrote:

Did anybody encounter this problem before and why it happens , how to solve it? 
 The same training data and same source code work in 1.6.1, however become 
lousy in 2.0.1

[cid:BD0EFC31-F4CE-421F-BC94-79EF3BE09D60]
--

Thanks,
David S.


Re: Did anybody come across this random-forest issue with spark 2.0.1.

2016-10-17 Thread 市场部
Hi Xi Shen

Not yet.  For the moment my idk for spark is still V7. Thanks for your 
reminding, I will try it out by upgrading java.

发件人: Xi Shen <davidshe...@gmail.com<mailto:davidshe...@gmail.com>>
日期: 2016年10月17日 星期一 下午8:00
至: zhangjianxin 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
主题: Re: Did anybody come across this random-forest issue with spark 2.0.1.

Did you also upgrade to Java from v7 to v8?

On Mon, Oct 17, 2016 at 7:19 PM 张建鑫(市场部) 
<zhangjian...@didichuxing.com<mailto:zhangjian...@didichuxing.com>> wrote:

Did anybody encounter this problem before and why it happens , how to solve it? 
 The same training data and same source code work in 1.6.1, however become 
lousy in 2.0.1

[cid:BD0EFC31-F4CE-421F-BC94-79EF3BE09D60]
--

Thanks,
David S.