Re: WARN from Similarity Calculation
I am still debugging it but I believe if m% of users have unusually large columns and the RDD partitioner on RowMatrix is hashPartitioner then due to the basic algorithm without sampling, some partitions can cause unusually large number of keys... If my debug shows that I will add a custom partitioner for RowMatrix (will be useful for sparse vectors, for dense vector it does not matter)... Of course from feature engineering, we will see if we can cut off the users with large number of columns... On Tue, Feb 17, 2015 at 1:58 PM, Xiangrui Meng men...@gmail.com wrote: It may be caused by GC pause. Did you check the GC time in the Spark UI? -Xiangrui On Sun, Feb 15, 2015 at 8:10 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am sometimes getting WARN from running Similarity calculation: 15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, abc.com, 48419, 0) with no recent heart beats: 66435ms exceeds 45000ms Do I need to increase the default 45 s to larger values for cases where we are doing blocked operation or long compute in the mapPartitions ? Thanks. Deb
Re: WARN from Similarity Calculation
It may be caused by GC pause. Did you check the GC time in the Spark UI? -Xiangrui On Sun, Feb 15, 2015 at 8:10 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am sometimes getting WARN from running Similarity calculation: 15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, abc.com, 48419, 0) with no recent heart beats: 66435ms exceeds 45000ms Do I need to increase the default 45 s to larger values for cases where we are doing blocked operation or long compute in the mapPartitions ? Thanks. Deb - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
WARN from Similarity Calculation
Hi, I am sometimes getting WARN from running Similarity calculation: 15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(7, abc.com, 48419, 0) with no recent heart beats: 66435ms exceeds 45000ms Do I need to increase the default 45 s to larger values for cases where we are doing blocked operation or long compute in the mapPartitions ? Thanks. Deb