Re: Mesos Spark Tasks - Lost

2015-05-20 Thread Tim Chen
Can you share your exact spark-submit command line?

And also cluster mode is not yet released yet (1.4) and doesn't support
spark-shell, so I think you're just using client mode unless you're using
latest master.

Tim

On Tue, May 19, 2015 at 8:57 AM, Panagiotis Garefalakis panga...@gmail.com
wrote:

 Hello all,

 I am facing a weird issue for the last couple of days running Spark on top
 of Mesos and I need your help. I am running Mesos in a private cluster and
 managed to deploy successfully  hdfs, cassandra, marathon and play but
 Spark is not working for a reason. I have tried so far:
 different java versions (1.6 and 1.7 oracle and openjdk), different
 spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1),
 different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies.

 More specifically while local tasks complete fine, in cluster mode all the
 tasks get lost.
 (both using spark-shell and spark-submit)
 From the worker log I see something like this:

 ---
 I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI
 'hdfs:/:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
 I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI
 'hdfs://X:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop
 Client
 I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from
 'hdfs://:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to
 '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
 I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource
 '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
 into
 '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3'
 *Error: Could not find or load main class two*

 ---

 And from the Spark Terminal:

 ---
 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
 SparkPi.scala:35
 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
 SparkPi.scala:35
 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent
 failure: Lost task 7.3 in stage 0.0 (TID 26, ): ExecutorLostFailure
 (executor lost)
 Driver stacktrace: at
 org.apache.spark.scheduler.DAGScheduler.org
 http://org.apache.spark.scheduler.dagscheduler.org/$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 ..
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 ---

 Any help will be greatly appreciated!

 Regards,
 Panagiotis



Re: Mesos Spark Tasks - Lost

2015-05-20 Thread Panagiotis Garefalakis
Tim thanks for your reply,

I am following this quite clear mesos-spark tutorial:
https://docs.mesosphere.com/tutorials/run-spark-on-mesos/
So mainly I tried running spark-shell which locally works fine but when the
jobs are submitted through mesos something goes wrong!

My question is: is there a some extra configuration needed for the workers
(that is not mentioned at the tutorial) ??

The Executor Lost message I get is really generic so I dont know whats
going on..
Please check the attached mesos execution event log.

Thanks again,
Panagiotis


On Wed, May 20, 2015 at 8:21 AM, Tim Chen t...@mesosphere.io wrote:

 Can you share your exact spark-submit command line?

 And also cluster mode is not yet released yet (1.4) and doesn't support
 spark-shell, so I think you're just using client mode unless you're using
 latest master.

 Tim

 On Tue, May 19, 2015 at 8:57 AM, Panagiotis Garefalakis 
 panga...@gmail.com wrote:

 Hello all,

 I am facing a weird issue for the last couple of days running Spark on
 top of Mesos and I need your help. I am running Mesos in a private cluster
 and managed to deploy successfully  hdfs, cassandra, marathon and play but
 Spark is not working for a reason. I have tried so far:
 different java versions (1.6 and 1.7 oracle and openjdk), different
 spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1),
 different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies.

 More specifically while local tasks complete fine, in cluster mode all
 the tasks get lost.
 (both using spark-shell and spark-submit)
 From the worker log I see something like this:

 ---
 I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI
 'hdfs:/:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
 I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI
 'hdfs://X:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop
 Client
 I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from
 'hdfs://:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to
 '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
 I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource
 '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
 into
 '/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3'
 *Error: Could not find or load main class two*

 ---

 And from the Spark Terminal:

 ---
 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
 15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
 SparkPi.scala:35
 15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
 SparkPi.scala:35
 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 7 in stage 0.0 failed 4 times, most recent
 failure: Lost task 7.3 in stage 0.0 (TID 26, ): ExecutorLostFailure
 (executor lost)
 Driver stacktrace: at
 org.apache.spark.scheduler.DAGScheduler.org
 http://org.apache.spark.scheduler.dagscheduler.org/$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 ..
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 ---

 Any help will be greatly appreciated!

 Regards,
 Panagiotis





-sparklogs-spark-shell-1431993674182-EVENT_LOG_1
Description: Binary data

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Mesos Spark Tasks - Lost

2015-05-19 Thread Panagiotis Garefalakis
Hello all,

I am facing a weird issue for the last couple of days running Spark on top
of Mesos and I need your help. I am running Mesos in a private cluster and
managed to deploy successfully  hdfs, cassandra, marathon and play but
Spark is not working for a reason. I have tried so far:
different java versions (1.6 and 1.7 oracle and openjdk), different
spark-env configuration, different Spark versions (from 0.8.8 to 1.3.1),
different HDFS versions (hadoop 5.1 and 4.6), and updating pom dependencies.

More specifically while local tasks complete fine, in cluster mode all the
tasks get lost.
(both using spark-shell and spark-submit)
From the worker log I see something like this:

---
I0519 02:36:30.475064 12863 fetcher.cpp:214] Fetching URI
'hdfs:/:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
I0519 02:36:30.747372 12863 fetcher.cpp:99] Fetching URI
'hdfs://X:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' using Hadoop
Client
I0519 02:36:30.747546 12863 fetcher.cpp:109] Downloading resource from
'hdfs://:8020/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz' to
'/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
I0519 02:36:34.205878 12863 fetcher.cpp:78] Extracted resource
'/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3/spark-1.1.0-bin-2.0.0-cdh4.7.0.tgz'
into
'/tmp/mesos/slaves/20150515-164602-2877535122-5050-32131-S2/frameworks/20150517-162701-2877535122-5050-28705-0084/executors/20150515-164602-2877535122-5050-32131-S2/runs/660d78ec-e2f4-4d38-881b-7209cbd3c5c3'
*Error: Could not find or load main class two*

---

And from the Spark Terminal:

---
15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
15/05/19 02:36:39 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
SparkPi.scala:35
15/05/19 02:36:39 INFO scheduler.DAGScheduler: Failed to run reduce at
SparkPi.scala:35
Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage failure: Task 7 in stage 0.0 failed 4 times, most recent failure:
Lost task 7.3 in stage 0.0 (TID 26, ): ExecutorLostFailure
(executor lost)
Driver stacktrace: at
org.apache.spark.scheduler.DAGScheduler.org
http://org.apache.spark.scheduler.dagscheduler.org/$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
..
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

---

Any help will be greatly appreciated!

Regards,
Panagiotis