Re: debug standalone Spark jobs?

Eugen Cepoi Sun, 05 Jan 2014 10:43:21 -0800

You can set the log level to INFO, it looks like spark is logging
applicative errors as INFO. When I have errors that I can reproduce only on
live data, I am running a spark shell with my job in its classpath, then I
debug & tweak things to find out what happens.



2014/1/5 Nan Zhu <[email protected]>

>  Yes, but my problem only appears when on a large dataset, anyway, Thanks
> for the reply
>
> Best,
>
> --
> Nan Zhu
>
> On Sunday, January 5, 2014 at 11:09 AM, Archit Thakur wrote:
>
> You can run your spark application locally by setting SPARK_MASTER="local"
> and then debug the launched jvm in your IDE.
>
>
> On Sun, Jan 5, 2014 at 9:04 PM, Nan Zhu <[email protected]> wrote:
>
>  Ah, yes, I think application logs really help
>
> Thank you
>
> --
> Nan Zhu
>
> On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote:
>
> Did you get to look at the spark worker logs? They would be at
> SPARK_HOME/logs/
> Also, you should look at the application logs itself. They would be under
> SPARK_HOME/work/APP_ID
>
>
>
> On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <[email protected]> wrote:
>
> Hi, all
>
> I’m trying to run a standalone job in a Spark cluster on EC2,
>
> obviously there is some bug in my code, after the job runs for several
> minutes, it failed with an exception
>
> Loading /usr/share/sbt/bin/sbt-launch-lib.bash
>
> [info] Set current project to rec_system (in build
> file:/home/ubuntu/rec_sys/)
>
> [info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20
> -l 0.005 -m spark://172.31.32.76:7077 --moviepath
> s3n://trainingset/netflix/training_set/* -o
> s3n://training_set/netflix/training_set/output.txt --rank 20 -r
> s3n://trainingset/netflix/training_set/mv_*
>
> log4j:WARN No appenders could be found for logger
> (akka.event.slf4j.Slf4jEventHandler).
>
> log4j:WARN Please initialize the log4j system properly.
>
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
>
> failed to init the engine class
>
> org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than
> 4 times
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>
> at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>
>
>
>
>
> However, this information does not mean anything to me, how can I print
> out the detailed log information in console
>
> I’m not sure about the reasons of those WARNs from log4j, I received the
> same WARNING when I run spark-shell, while in there, I can see detailed
> information like which task is running, etc.
>
> Best,
>
> --
> Nan Zhu
>
>
>
>
> --
> It's just about how deep your longing is!
>
>
>
>
>

Re: debug standalone Spark jobs?

Reply via email to