No, they use the same implementation. On Fri, Jun 26, 2015 at 8:05 AM, Ayman Farahat <ayman.fara...@yahoo.com> wrote: > I use the mllib not the ML. Does that make a difference ? > > Sent from my iPhone > > On Jun 26, 2015, at 7:19 AM, Ravi Mody <rmody...@gmail.com> wrote: > > Forgot to mention: rank of 100 usually works ok, 120 consistently cannot > finish. > > On Fri, Jun 26, 2015 at 10:18 AM, Ravi Mody <rmody...@gmail.com> wrote: >> >> 1. These are my settings: >> rank = 100 >> iterations = 12 >> users = ~20M >> items = ~2M >> training examples = ~500M-1B (I'm running into the issue even with 500M >> training examples) >> >> 2. The memory storage never seems to go too high. The user blocks may go >> up to ~10Gb, and each executor will have a few GB used out of 30 free GB. >> Everything seems small compared to the amount of memory I'm using. >> >> 3. I think I have a lot of disk space - is this on the executors or the >> driver? Is there a way to know if the error is coming from disk space. >> >> 4. I'm not changing checkpointing settings, but I think checkpointing >> defaults to every 10 iterations? One notable thing is the crashes often >> start on or after the 9th iteration, so it may be related to checkpointing. >> But this could just be a coincidence. >> >> Thanks! >> >> >> >> >> >> On Fri, Jun 26, 2015 at 1:08 AM, Ayman Farahat <ayman.fara...@yahoo.com> >> wrote: >>> >>> was there any resolution to that problem? >>> I am also having that with Pyspark 1.4 >>> 380 Million observations >>> 100 factors and 5 iterations >>> Thanks >>> Ayman >>> >>> On Jun 23, 2015, at 6:20 PM, Xiangrui Meng <men...@gmail.com> wrote: >>> >>> > It shouldn't be hard to handle 1 billion ratings in 1.3. Just need >>> > more information to guess what happened: >>> > >>> > 1. Could you share the ALS settings, e.g., number of blocks, rank and >>> > number of iterations, as well as number of users/items in your >>> > dataset? >>> > 2. If you monitor the progress in the WebUI, how much data is stored >>> > in memory and how much data is shuffled per iteration? >>> > 3. Do you have enough disk space for the shuffle files? >>> > 4. Did you set checkpointDir in SparkContext and checkpointInterval in >>> > ALS? >>> > >>> > Best, >>> > Xiangrui >>> > >>> > On Fri, Jun 19, 2015 at 11:43 AM, Ravi Mody <rmody...@gmail.com> wrote: >>> >> Hi, I'm running implicit matrix factorization/ALS in Spark 1.3.1 on >>> >> fairly >>> >> large datasets (1+ billion input records). As I grow my dataset I >>> >> often run >>> >> into issues with a lot of failed stages and dropped executors, >>> >> ultimately >>> >> leading to the whole application failing. The errors are like >>> >> "org.apache.spark.shuffle.MetadataFetchFailedException: Missing an >>> >> output >>> >> location for shuffle 19" and >>> >> "org.apache.spark.shuffle.FetchFailedException: >>> >> Failed to connect to...". These occur during flatMap, mapPartitions, >>> >> and >>> >> aggregate stages. I know that increasing memory fixes this issue, but >>> >> most >>> >> of the time my executors are only using a tiny portion of the their >>> >> allocated memory (<10%). Often, the stages run fine until the last >>> >> iteration >>> >> or two of ALS, but this could just be a coincidence. >>> >> >>> >> I've tried tweaking a lot of settings, but it's time-consuming to do >>> >> this >>> >> through guess-and-check. Right now I have these set: >>> >> spark.shuffle.memoryFraction = 0.3 >>> >> spark.storage.memoryFraction = 0.65 >>> >> spark.executor.heartbeatInterval = 600000 >>> >> >>> >> I'm sure these settings aren't optimal - any idea of what could be >>> >> causing >>> >> my errors, and what direction I can push these settings in to get more >>> >> out >>> >> of my memory? I'm currently using 240 GB of memory (on 7 executors) >>> >> for a 1 >>> >> billion record dataset, which seems like too much. Thanks! >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> > For additional commands, e-mail: user-h...@spark.apache.org >>> > >>> >> >
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org