Corrupted Exception while deserialize task

2014-12-25 Thread WangTaoTheTonic
Hi Guys, 

I found an excetpion while running application using 1.2.0-snapshot version.
It shows like this:

2014-12-23 07:45:36,333 | ERROR | [Executor task launch worker-0] |
Exception in task 0.0 in stage 0.0 (TID 0) |
org.apache.spark.Logging$class.logError(Logging.scala:96)
java.io.StreamCorruptedException: invalid stream header: 00546864
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
at java.io.ObjectInputStream.init(ObjectInputStream.java:299)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.init(JavaSerializer.scala:57)
at
org.apache.spark.serializer.JavaDeserializationStream.init(JavaSerializer.scala:57)
at
org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:99)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:163)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-12-23 07:45:36,357 | INFO  |
[sparkExecutor-akka.actor.default-dispatcher-3] | Got assigned task 1 |
org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2014-12-23 07:45:36,358 | INFO  | [Executor task launch worker-0] | Running
task 1.0 in stage 0.0 (TID 1) |
org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2014-12-23 07:45:36,414 | ERROR | [Executor task launch worker-0] |
Exception in task 1.0 in stage 0.0 (TID 1) |
org.apache.spark.Logging$class.logError(Logging.scala:96)
java.io.StreamCorruptedException: invalid stream header: 00546864
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
at java.io.ObjectInputStream.init(ObjectInputStream.java:299)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.init(JavaSerializer.scala:57)
at
org.apache.spark.serializer.JavaDeserializationStream.init(JavaSerializer.scala:57)
at
org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:99)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:163)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I know it happened while executor deserialize task. But after checking the
spark code, I found components of one task is very simple: its files, jars
and an Task object contains stageId and partitionId.

I cann't confirm what cause this issue and it is hard to reproduce it.

But I think the application code does not make difference as code segment
here is gransparent to users.

Anyone have some ideas? Thanks for offering help.

P.S. This error occured in every executor of this application.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Corrupted-Exception-while-deserialize-task-tp20857.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Who manage the log4j appender while running spark on yarn?

2014-12-22 Thread WangTaoTheTonic
After some discussions with Hadoop guys, I got how the mechanism works.
If we don't add -Dlog4j.configuration into java options to the container(AM
or executors), they will use log4j.properties(if any) under container's
classpath(extraClasspath plus yarn.application.classpath).

If we wanna custom our log4j configuration, we should add
spark.executor.extraJavaOptions=-Dlog4j.configuration=/path/to/log4j.properties
or
spark.yarn.am.extraJavaOptions=-Dlog4j.configuration=/path/to/log4j.properties
in spark-defaults.conf file.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Who-manage-the-log4j-appender-while-running-spark-on-yarn-tp20778p20818.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Who manage the log4j appender while running spark on yarn?

2014-12-19 Thread WangTaoTheTonic
Hi guys, 

I recently ran spark on yarn and found spark didn't set any log4j properties
file in configuration or code. And the log4j logs was writing into stderr
file under ${yarn.nodemanager.log-dirs}/application_${appid}.

I wanna know which side(spark or hadoop) controll the appender? Have found
that related disscussion here:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-logging-strategy-on-YARN-td8751.html,
but I think spark code has changed a lot since then.

Any one could offer some guide? Thanks.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Who-manage-the-log4j-appender-while-running-spark-on-yarn-tp20778.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org