Solved Looks like it's some incompatibility in the build when using -Phadoop-2.4 , made the distribution with -Phadoop-provided and that fixed the issue
On Tue, Apr 21, 2015 at 2:03 PM, Fernando O. <fot...@gmail.com> wrote: > Hi all, > > I'm wondering if SparkPi works with hadoop HA (I guess it should) > > > Hadoop's pi example works great on my cluster, so after having that done I > installed spark and in the worker log I'm seeing two problems that might be > related. > > > Versions: Hadoop 2.6.0 > > Spark 1.3.1 > > > I'm running : > > ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master > yarn-cluster --num-executors 2 --driver-memory 2g > --executor-memory 1g --executor-cores 1 lib/spark-examples*.jar 10 > > > It seems like it can't see the resource manager but I pinged and telnet it > from the node and that works, also: hadoop's pi example works... > > So I'm not sure why spark is not seeing the RM > > > > 15/04/21 15:56:21 INFO yarn.YarnRMClient: Registering the > ApplicationMaster*15/04/21 15:56:22 INFO ipc.Client: Retrying connect to > server: 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. Already tried 0 > time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS) > 15/04/21 15:56:23 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. Already tried 1 time(s); > retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, > sleepTime=1000 MILLISECONDS)* > 15/04/21 15:56:24 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 15/04/21 15:56:25 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 15/04/21 15:56:26 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 4 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 15/04/21 15:56:27 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 5 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 15/04/21 15:56:28 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 6 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 15/04/21 15:56:29 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 7 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 15/04/21 15:56:30 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 8 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 15/04/21 15:56:31 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 15/04/21 15:56:51 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend > is ready for scheduling beginning after waiting > maxRegisteredResourcesWaitingTime: 30000(ms) > 15/04/21 15:56:51 INFO cluster.YarnClusterScheduler: > YarnClusterScheduler.postStartHook done > 15/04/21 15:56:51 INFO spark.SparkContext: Starting job: reduce at > SparkPi.scala:35 > 15/04/21 15:56:51 INFO scheduler.DAGScheduler: Got job 0 (reduce at > SparkPi.scala:35) with 10 output partitions (allowLocal=false) > 15/04/21 15:56:51 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at > SparkPi.scala:35) > 15/04/21 15:56:51 INFO scheduler.DAGScheduler: Parents of final stage: List() > 15/04/21 15:56:51 INFO scheduler.DAGScheduler: Missing parents: List() > 15/04/21 15:56:51 INFO scheduler.DAGScheduler: Submitting Stage 0 > (MapPartitionsRDD[1] at map at SparkPi.scala:31), which has no missing parents > 15/04/21 15:56:51 INFO cluster.YarnClusterScheduler: Cancelling stage 0 > 15/04/21 15:56:51 INFO scheduler.DAGScheduler: Stage 0 (reduce at > SparkPi.scala:35) failed in Unknown s > 15/04/21 15:56:51 INFO scheduler.DAGScheduler: Job 0 failed: reduce at > SparkPi.scala:35, took 0.121974 s*15/04/21 15:56:51 ERROR > yarn.ApplicationMaster: User class threw exception: Job aborted due to stage > failure: Task serialization failed: > java.lang.reflect.InvocationTargetException > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:526)* > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) > org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:79) > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051)org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:839)org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.lang.reflect.InvocationTargetException > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) > org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:79) > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051)org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:839)org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1204) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1193) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1192) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:847) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > at > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > 15/04/21 15:56:51 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 15, (reason: User class threw exception: Job aborted due to stage > failure: Task serialization failed: > java.lang.reflect.InvocationTargetException > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68) > org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) > org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:79) > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051)org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:839)org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > ) > 15/04/21 15:57:02 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 15/04/21 15:57:03 INFO ipc.Client: Retrying connect to server: > 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > >