Re: ItemSimilarityDriver failing to write text file

Pat Ferrel Tue, 16 Sep 2014 11:49:20 -0700

By local I assume you are talking about your dev machine, not one of the 
cluster machines.


Excuse me if I’m stating the obvious but you are using two completely different 
Spark and Hadoop installations, one local and one remote. They could be 
completely different codebases. Just because you have configured Spark and 
Hadoop to execute locally doesn’t mean they work remotely. It sounds like you 
are using the CLI on your dev machine, which is set to run locally, and passing 
a remote Spark master URI and S3 URI to the local Mahout script. I would 
install and set up Mahout on your cluster master, make sure MAHOUT_LOCAL is not 
set there since you will be using a cluster, and execute the CLI from there.

Furthermore are you sure that the remote Spark cluster can see the S3 data? Ssh 
to the Spark master and do something like “hadoop fs -ls” or supply the URI to 
verify that the Hadoop config on the remote cluster, which is what the remote 
Spark will use, can get to the data.


On Sep 15, 2014, at 2:28 PM, Phil Wills <[email protected]> wrote:

The data and s3n file system is OK, since when I run 'locally' that's just
without a master specified, but otherwise identically, it works fine. I've
been using the spark-ec2 scripts to retrieve spark and hadoop, so had
assumed that meant they were operating compatible versions, but I'm not
specifying which hadoop to use explicitly, so I don't know if that has an
effect.

Phil

On Mon, Sep 15, 2014 at 7:25 PM, Pat Ferrel <[email protected]> wrote:

> It should handle this input—no surprise.
> 
> Spark must be compiled for the correct version of Hadoop that you are
> using (Mahout also). I’d make sure Spark is working properly with your HDFS
> by trying one of their examples if you haven’t already. Running locally may
> not be using the same version of Hadoop, have you checked that?
> 
> A filenamePattern of ‘.*’ will get all files in
> s3n://recommendation-logs/2014/09/06 and you have it set to search
> recursively. Check to make sure this is what you want. Did you use the same
> dir structure as you have on s3n when you ran locally? Since this driver
> looks at text files it can think it is working on data if it finds “[\t, ]”
> a tab, comma, or space in the line when it’s reading garbage so you should
> be sure it is working on only the files you want. Tell it to look for only
> a tab if that’s what you are using or use a regex to match the entire
> filename like “^part.*” or “.*log”.
> 
> I have not tested with s3n:// URIs. I assume you can read all these with
> the hadoop tools like “hadoop fs -ls s3n://recommendation-logs/2014/09/06”?
> 
> off-list I’ll send a link to epinions data formatted for Mahout. You can
> try putting that in HDFS via sn3 and running it because I have tested that
> on a cluster. It is all in one file though so if there is a problem in file
> discovery it won’t show up.
> 
> 
> On Sep 15, 2014, at 9:10 AM, Phil Wills <[email protected]> wrote:
> 
> Tried running locally on a reasonably beefy machine and it worked fine.
> Which is the toy data, you're referring to?
> 
> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input
> s3n://recommendation-logs/2014/09/06 --output
> s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*' --recursive
> --master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077
> --sparkExecutorMem 6g
> 
> and the working version running locally on a beefier box:
> 
> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity --input
> s3n://ophan-recommendation-logs/2014/09/06 --output
> s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*'
> --recursive  --sparkExecutorMem 16g
> 
> Sample input:
> 
> nnS1dIIBBtTnehVD79lgYeBw
> 
> http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers
> 
> ikFSk14vHrTPqjSISvMihDUg
> 
> http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis
> 
> edqu8kfgsFSg2w3MhV5rUwuQ
> 
> http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu
> 
> pfnmfONG1DQWG_EOOIxUASow
> 
> http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates
> 
> pfUil_W0s2TZSqojMQrVcxVw        http://www.
> 
> example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france
> 
> nxTJnpyenFSP-tqWSLHQdW8w
> http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age
> 
> lba37jwJVQS5GbiSuus1i6tA
> 
> http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed
> 
> bEHaOzZPbtQz-X2K1wortBQQ
> 
> http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation
> 
> gjTGzDXiDOT5W2SThhm0tUmg
> 
> http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times
> 
> pfFbQ5ddvBRhm0XLZbN6Xd2A
> 
> http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby
> 
> 
> 
> On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <[email protected]> wrote:
> 
>> I wonder if it’s trying to write an empty rdd to a text file. Can you
> give
>> the CLI options and a snippet of data?
>> 
>> Also have you successfully run this on the toy data in the resource dir?
>> There is a script to run it locally that you can adapt for running on a
>> cluster. This will eliminate any cluster problem.
>> 
>> 
>> On Sep 13, 2014, at 1:13 PM, Phil Wills <[email protected]> wrote:
>> 
>> Here's the master log from the line with the stack trace to termination:
>> 
>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> saveAsTextFile
>> at TextDelimitedReaderWriter.scala:288
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due
>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>> reason
>> Driver stacktrace:
>> at org.apache.spark.scheduler.DAGScheduler.org
>> 
>> 
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>> at
>> 
>> 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at
>> 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>> at scala.Option.foreach(Option.scala:236)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>> 
>> 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> 
>> 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> 
>> 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch
> 20)
>> 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
>> executor 8 from BlockManagerMaster.
>> 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8 successfully
>> in removeExecutor
>> 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block
> manager
>> ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
>> 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
>> executor:
>> Actor[akka.tcp://[email protected]
>> :56590/user/Executor#1456047585]
>> with ID 9
>> 
>> On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <[email protected]>
> wrote:
>> 
>>> It’s not an error I’ve seen but they can tend to be pretty cryptic.
> Could
>>> you post more of the stack trace?
>>> 
>>> On Sep 12, 2014, at 2:55 PM, Phil Wills <[email protected]> wrote:
>>> 
>>> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
>>> that.  I used the spark-ec2 scripts to set up the cluster.
>>> 
>>> I might be able to share the data I'll mull it over the weekend to make
>>> sure there's nothing sensitive, or if there's a way I can transform it
> to
>>> that point.
>>> 
>>> Phil
>>> 
>>> 
>>> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <[email protected]>
>> wrote:
>>> 
>>>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
>>>> 
>>>> 
>>>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <[email protected]>
> wrote:
>>>> 
>>>> Is it a mature Spark cluster, what version of Spark?
>>>> 
>>>> If you can share the data I can try it on mine.
>>>> 
>>>> On Sep 12, 2014, at 9:42 AM, Phil Wills <[email protected]> wrote:
>>>> 
>>>> I've been experimenting with the fairly new ItemSimilarityDriver, which
>>> is
>>>> working fine up until the point it tries to write out it's results.
>>>> Initially I was getting an issue with the akka frameSize being too
>> small,
>>>> but after expanding that I'm now getting a much more cryptic error:
>>>> 
>>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
>>> saveAsTextFile
>>>> at TextDelimitedReaderWriter.scala:288
>>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>> due
>>>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
>> 448
>>>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>>>> reason
>>>> 
>>>> This is from the master node, but there doesn't seem to be anything
> more
>>>> intelligible in the slave node logs.
>>>> 
>>>> I've tried writing to the local file system as well as s3n and can see
>>> it's
>>>> not an access problem, as I am seeing a zero length file appear.
>>>> 
>>>> Thanks for any pointers and apologies if this would be better to ask on
>>> the
>>>> Spark list,
>>>> 
>>>> Phil
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: ItemSimilarityDriver failing to write text file

Reply via email to