How to avoid being killed by YARN node manager ?

2015-03-24 Thread Yuichiro Sakamoto
k. The conditions of machines and Spark settings are as follows. 1)six machines, physical memory is 32GB of each machine. 2)Spark settings - spark.executor.memory=16g - spark.closure.serializer=org.apache.spark.serializer.KryoSerializer - spark.rdd.compress=true - spark.shuffle.memoryFraction=

Re: Can't cache RDD of collaborative filtering on MLlib

2015-03-12 Thread Yuichiro Sakamoto
I got answer from mail posted to ML. --- Summary --- cache() is lazy, so you can use `RDD.count()` explicitly to load into memory. --- And I tried, two RDDs were cached and the speed became faster. Thank you. -- View this message in context: http://apache-spark-user-list.1001560.

Re: Can't cache RDD of collaborative filtering on MLlib

2015-03-10 Thread Yuichiro Sakamoto
such > as `model .userFeatures.getStorageLevel()`. I printed the return value of getStorageLevel() "userFeatures" and "productFeatures", both were "Memory Deserialized 1x Replicated" . I think, two variables were configured to cache, but didn't cach

Can't cache RDD of collaborative filtering on MLlib

2015-03-08 Thread Yuichiro Sakamoto
Hello. I create program, collaborative filtering using Spark, but I have trouble with calculating speed. I want to implement recommendation program using ALS (MLlib), which is another process from Spark. But access speed of MatrixFactorizationModel object on HDFS is slow, so I want to cache it, b