how do i set these partitons? is this is the call to ALS model = ALS.trainImplicit(ratings, rank, numIterations)?
On Jun 26, 2015, at 12:33 PM, Xiangrui Meng <men...@gmail.com> wrote: > So you have 100 partitions (blocks). This might be too many for your dataset. > Try setting a smaller number of blocks, e.g., 32 or 64. When ALS starts > iterations, you can see the shuffle read/write size from the "stages" tab of > Spark WebUI. Vary number of blocks and check the numbers there. Kyro > serializer doesn't help much here. You can try disabling it (though I don't > think it caused the failure). -Xiangrui > > On Fri, Jun 26, 2015 at 11:00 AM, Ayman Farahat <ayman.fara...@yahoo.com> > wrote: > Hello ; > I checked on my partitions/storage and here is what I have > > I have 80 executors > 5 G per executore. > > Do i need to set additional params > say cores > > spark.serializer org.apache.spark.serializer.KryoSerializer > # spark.driver.memory 5g > # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value > -Dnumbers="one two three" > spark.shuffle.memoryFraction 0.3 > spark.storage.memoryFraction 0.65 > > > > RDD Name Storage Level Cached Partitions Fraction Cached Size in > Memory Size in Tachyon Size on Disk > ratingBlocks Memory Deserialized 1x Replicated 257 129% > 4.1 GB 0.0 B 0.0 B > itemOutBlocks Memory Deserialized 1x Replicated 100 100% > 7.3 MB 0.0 B 0.0 B > 38 Memory Serialized 1x Replicated 193 97% 5.6 GB 0.0 B 0.0 B > userInBlocks Memory Deserialized 1x Replicated 100 100% > 2.8 GB 0.0 B 0.0 B > itemFactors-1 Memory Deserialized 1x Replicated 69 69% > 8.4 MB 0.0 B 0.0 B > itemInBlocks Memory Deserialized 1x Replicated 69 69% > 1455.3 MB 0.0 B 0.0 B > userFactors-1 Memory Deserialized 1x Replicated 100 100% > 35.0 GB 0.0 B 0.0 B > userOutBlocks Memory Deserialized 1x Replicated 100 100% > 1062.7 MB 0.0 B 0.0 B > > On Jun 26, 2015, at 8:26 AM, Xiangrui Meng <men...@gmail.com> wrote: > >> number of CPU cores or less. > >