Hi Akhil, 1) How could I see how much time it is spending on stage 1? Or what if, like above, it doesn't get past stage 1?
2) How could I check if its a GC time? and where would I increase the parallelism for the model? I have a Spark Master and 2 Workers running on CDH 5.3...what would the default spark-shell level of parallelism be...I thought it would be 3? Thank you for the help! -Su On Thu, Mar 19, 2015 at 12:32 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Can you see where exactly it is spending time? Like you said it goes to > Stage 2, then you will be able to see how much time it spend on Stage 1. > See if its a GC time, then try increasing the level of parallelism or > repartition it like sc.getDefaultParallelism*3. > > Thanks > Best Regards > > On Thu, Mar 19, 2015 at 12:15 PM, Su She <suhsheka...@gmail.com> wrote: > >> Hello Everyone, >> >> I am trying to run this MLlib example from Learning Spark: >> >> https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48 >> >> Things I'm doing differently: >> >> 1) Using spark shell instead of an application >> >> 2) instead of their spam.txt and normal.txt I have text files with 3700 >> and 2700 words...nothing huge at all and just plain text >> >> 3) I've used numFeatures = 100, 1000 and 10,000 >> >> *Error: *I keep getting stuck when I try to run the model: >> >> val model = new LogisticRegressionWithSGD().run(trainingData) >> >> It will freeze on something like this: >> >> [Stage 1:==============> (1 + >> 0) / 4] >> >> Sometimes its Stage 1, 2 or 3. >> >> I am not sure what I am doing wrong...any help is much appreciated, thank >> you! >> >> -Su >> >> >> >