Re: a vague question, but perhaps it might ring a bell

Akhil Das Sun, 04 Jan 2015 22:22:46 -0800

What are you trying to do? Can you paste the whole code? I used to see this
sort of Exception when i close the fs object inside map/mapPartition etc.


Thanks
Best Regards

On Mon, Jan 5, 2015 at 6:43 AM, Michael Albert <
m_albert...@yahoo.com.invalid> wrote:

> Greetings!
>
> So, I think I have data saved so that each partition (part-r-00000, etc)
> is exactly what I wan to translate into an output file of a format not
> related to
> hadoop.
>
> I believe I've figured out how to tell Spark to read the data set without
> re-partitioning (in
> another post I mentioned this -- I have a non-splitable InputFormat).
>
> I do something like
>    mapPartitionWithIndex( (partId, iter) =>
>            conf = new Configuration()
>            fs = Filesystem.get(conf)
>            strm = fs.create(new Path(...))
>             //  write data to stream
>           strm.close() // in finally block }
>
> This runs for a few hundred input files (so each executors sees 10's of
> files),
> and it chugs along nicely, then suddenly everything shuts down.
> I can restart (telling it to skip the partIds which it has already
> completed), and it
> chugs along again for a while (going past the previous stopping point) and
> again dies.
>
> I am a t a loss.  This work for the first 10's of files (so it runs for
> about 1hr) then quits,
> and I see no useful error information (no Exceptions except the stuff
> below.
> I'm not shutting it down.
>
> Any idea what I might check? I've bumped up the memory multiple times (16G
> currently)
> and fiddled with increasing other parameters.
>
> Thanks.
> Exception in thread "main" org.apache.spark.SparkException: Job cancelled
> because SparkContext was shut down
>     at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:694)
>     at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:693)
>     at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>     at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693)
>     at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399)
>     at
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
>     at
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)
>     at akka.actor.ActorCell.terminate(ActorCell.scala:338)
>     at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)
>     at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
>     at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
>
>
>

Re: a vague question, but perhaps it might ring a bell

Reply via email to