I wonder if it’s trying to write an empty rdd to a text file. Can you give the 
CLI options and a snippet of data?

Also have you successfully run this on the toy data in the resource dir? There 
is a script to run it locally that you can adapt for running on a cluster. This 
will eliminate any cluster problem.


On Sep 13, 2014, at 1:13 PM, Phil Wills <[email protected]> wrote:

Here's the master log from the line with the stack trace to termination:

14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
at TextDelimitedReaderWriter.scala:288
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
reason
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch 20)
14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
executor 8 from BlockManagerMaster.
14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8 successfully
in removeExecutor
14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block manager
ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
executor:
Actor[akka.tcp://[email protected]:56590/user/Executor#1456047585]
with ID 9

On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <[email protected]> wrote:

> It’s not an error I’ve seen but they can tend to be pretty cryptic. Could
> you post more of the stack trace?
> 
> On Sep 12, 2014, at 2:55 PM, Phil Wills <[email protected]> wrote:
> 
> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
> that.  I used the spark-ec2 scripts to set up the cluster.
> 
> I might be able to share the data I'll mull it over the weekend to make
> sure there's nothing sensitive, or if there's a way I can transform it to
> that point.
> 
> Phil
> 
> 
> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <[email protected]> wrote:
> 
>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
>> 
>> 
>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <[email protected]> wrote:
>> 
>> Is it a mature Spark cluster, what version of Spark?
>> 
>> If you can share the data I can try it on mine.
>> 
>> On Sep 12, 2014, at 9:42 AM, Phil Wills <[email protected]> wrote:
>> 
>> I've been experimenting with the fairly new ItemSimilarityDriver, which
> is
>> working fine up until the point it tries to write out it's results.
>> Initially I was getting an issue with the akka frameSize being too small,
>> but after expanding that I'm now getting a much more cryptic error:
>> 
>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> saveAsTextFile
>> at TextDelimitedReaderWriter.scala:288
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due
>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>> reason
>> 
>> This is from the master node, but there doesn't seem to be anything more
>> intelligible in the slave node logs.
>> 
>> I've tried writing to the local file system as well as s3n and can see
> it's
>> not an access problem, as I am seeing a zero length file appear.
>> 
>> Thanks for any pointers and apologies if this would be better to ask on
> the
>> Spark list,
>> 
>> Phil
>> 
>> 
>> 
> 
> 

Reply via email to