Thank you for updating the files Holden! I actually was using that
same text in my files located on HDFS. Could the files being located
on HDFS be the reason why the example gets stuck? I c/p the code
provided on github, the only things I changed were:
a) file paths to: val spam =
Hello Xiangrui,
I use spark 1.2.0 on cdh 5.3. Thanks!
-Su
On Fri, Mar 20, 2015 at 2:27 PM Xiangrui Meng men...@gmail.com wrote:
Su, which Spark version did you use? -Xiangrui
On Thu, Mar 19, 2015 at 3:49 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
To get these metrics out, you need
Su, which Spark version did you use? -Xiangrui
On Thu, Mar 19, 2015 at 3:49 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
To get these metrics out, you need to open the driver ui running on port
4040. And in there you will see Stages information and for each stage you
can see how much time
Hello Everyone,
I am trying to run this MLlib example from Learning Spark:
https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48
Things I'm doing differently:
1) Using spark shell instead of an application
2) instead of
Can you see where exactly it is spending time? Like you said it goes to
Stage 2, then you will be able to see how much time it spend on Stage 1.
See if its a GC time, then try increasing the level of parallelism or
repartition it like sc.getDefaultParallelism*3.
Thanks
Best Regards
On Thu, Mar
Hi Akhil,
1) How could I see how much time it is spending on stage 1? Or what if,
like above, it doesn't get past stage 1?
2) How could I check if its a GC time? and where would I increase the
parallelism for the model? I have a Spark Master and 2 Workers running on
CDH 5.3...what would the
To get these metrics out, you need to open the driver ui running on port
4040. And in there you will see Stages information and for each stage you
can see how much time it is spending on GC etc. In your case, the
parallelism seems 4, the more # of parallelism the more # of tasks you will
see.