Re: WARN from Similarity Calculation

2015-02-18 Thread Debasish Das
I am still debugging it but I believe if m% of users have unusually large
columns and the RDD partitioner on RowMatrix is hashPartitioner then due to
the basic algorithm without sampling, some partitions can cause unusually
large number of keys...

If my debug shows that I will add a custom partitioner for RowMatrix (will
be useful for sparse vectors, for dense vector it does not matter)...

Of course from feature engineering, we will see if we can cut off the users
with large number of columns...

On Tue, Feb 17, 2015 at 1:58 PM, Xiangrui Meng men...@gmail.com wrote:

 It may be caused by GC pause. Did you check the GC time in the Spark
 UI? -Xiangrui

 On Sun, Feb 15, 2015 at 8:10 PM, Debasish Das debasish.da...@gmail.com
 wrote:
  Hi,
 
  I am sometimes getting WARN from running Similarity calculation:
 
  15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing BlockManager
  BlockManagerId(7, abc.com, 48419, 0) with no recent heart beats: 66435ms
  exceeds 45000ms
 
  Do I need to increase the default 45 s to larger values for cases where
 we
  are doing blocked operation or long compute in the mapPartitions ?
 
  Thanks.
  Deb



Re: WARN from Similarity Calculation

2015-02-17 Thread Xiangrui Meng
It may be caused by GC pause. Did you check the GC time in the Spark
UI? -Xiangrui

On Sun, Feb 15, 2015 at 8:10 PM, Debasish Das debasish.da...@gmail.com wrote:
 Hi,

 I am sometimes getting WARN from running Similarity calculation:

 15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing BlockManager
 BlockManagerId(7, abc.com, 48419, 0) with no recent heart beats: 66435ms
 exceeds 45000ms

 Do I need to increase the default 45 s to larger values for cases where we
 are doing blocked operation or long compute in the mapPartitions ?

 Thanks.
 Deb

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



WARN from Similarity Calculation

2015-02-15 Thread Debasish Das
Hi,

I am sometimes getting WARN from running Similarity calculation:

15/02/15 23:07:55 WARN BlockManagerMasterActor: Removing BlockManager
BlockManagerId(7, abc.com, 48419, 0) with no recent heart beats: 66435ms
exceeds 45000ms

Do I need to increase the default 45 s to larger values for cases where we
are doing blocked operation or long compute in the mapPartitions ?

Thanks.
Deb