Did you try to use less number of partitions (user/product blocks)?
Did you use implicit feedback? In the current implementation, we only
do checkpointing with implicit feedback. We should adopt the
checkpoint strategy implemented in LDA:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/impl/PeriodicGraphCheckpointer.scala
for ALS. Could you try the latest branch-1.3 or master and see whether
it helps? -Xiangrui

On Mon, Feb 23, 2015 at 6:21 AM, Antony Mayi
<antonym...@yahoo.com.invalid> wrote:
> Hi,
>
> This has already been briefly discussed here in the past but there seems to
> be more questions...
>
> I am running bigger ALS task with input data ~40GB (~3 billions of ratings).
> The data is partitioned into 512 partitions and I am also using default
> parallelism set to 512. The ALS runs with rank=100, iters=15. Using spark
> 1.2.0.
>
> The issue is the volume of temporal data stored on disks generated during
> the processing. You can see the effect here:
> http://picpaste.com/disk-UKGFOlte.png It stores 12TB!!! of data until it
> reaches the 90% threshold when yarn kills it.
>
> I have checkpoint directory set so allegedly it should be clearing the temp
> data but not sure that's happening (although there is 1 drop seen).
>
> Is there any solution for this? 12TB of temp not getting cleaned seems to be
> wrong.
>
> Thanks,
> Antony.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to