The data and s3n file system is OK, since when I run 'locally' that's just without a master specified, but otherwise identically, it works fine. I've been using the spark-ec2 scripts to retrieve spark and hadoop, so had assumed that meant they were operating compatible versions, but I'm not specifying which hadoop to use explicitly, so I don't know if that has an effect.
Phil On Mon, Sep 15, 2014 at 7:25 PM, Pat Ferrel <[email protected]> wrote: > It should handle this input—no surprise. > > Spark must be compiled for the correct version of Hadoop that you are > using (Mahout also). I’d make sure Spark is working properly with your HDFS > by trying one of their examples if you haven’t already. Running locally may > not be using the same version of Hadoop, have you checked that? > > A filenamePattern of ‘.*’ will get all files in > s3n://recommendation-logs/2014/09/06 and you have it set to search > recursively. Check to make sure this is what you want. Did you use the same > dir structure as you have on s3n when you ran locally? Since this driver > looks at text files it can think it is working on data if it finds “[\t, ]” > a tab, comma, or space in the line when it’s reading garbage so you should > be sure it is working on only the files you want. Tell it to look for only > a tab if that’s what you are using or use a regex to match the entire > filename like “^part.*” or “.*log”. > > I have not tested with s3n:// URIs. I assume you can read all these with > the hadoop tools like “hadoop fs -ls s3n://recommendation-logs/2014/09/06”? > > off-list I’ll send a link to epinions data formatted for Mahout. You can > try putting that in HDFS via sn3 and running it because I have tested that > on a cluster. It is all in one file though so if there is a problem in file > discovery it won’t show up. > > > On Sep 15, 2014, at 9:10 AM, Phil Wills <[email protected]> wrote: > > Tried running locally on a reasonably beefy machine and it worked fine. > Which is the toy data, you're referring to? > > JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark > MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input > s3n://recommendation-logs/2014/09/06 --output > s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*' --recursive > --master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077 > --sparkExecutorMem 6g > > and the working version running locally on a beefier box: > > JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark > MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity --input > s3n://ophan-recommendation-logs/2014/09/06 --output > s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*' > --recursive --sparkExecutorMem 16g > > Sample input: > > nnS1dIIBBtTnehVD79lgYeBw > > http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers > > ikFSk14vHrTPqjSISvMihDUg > > http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis > > edqu8kfgsFSg2w3MhV5rUwuQ > > http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu > > pfnmfONG1DQWG_EOOIxUASow > > http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates > > pfUil_W0s2TZSqojMQrVcxVw http://www. > > example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france > > nxTJnpyenFSP-tqWSLHQdW8w > http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age > > lba37jwJVQS5GbiSuus1i6tA > > http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed > > bEHaOzZPbtQz-X2K1wortBQQ > > http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation > > gjTGzDXiDOT5W2SThhm0tUmg > > http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times > > pfFbQ5ddvBRhm0XLZbN6Xd2A > > http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby > > > > On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <[email protected]> wrote: > > > I wonder if it’s trying to write an empty rdd to a text file. Can you > give > > the CLI options and a snippet of data? > > > > Also have you successfully run this on the toy data in the resource dir? > > There is a script to run it locally that you can adapt for running on a > > cluster. This will eliminate any cluster problem. > > > > > > On Sep 13, 2014, at 1:13 PM, Phil Wills <[email protected]> wrote: > > > > Here's the master log from the line with the stack trace to termination: > > > > 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run > saveAsTextFile > > at TextDelimitedReaderWriter.scala:288 > > Exception in thread "main" org.apache.spark.SparkException: Job aborted > due > > to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448 > > on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown > > reason > > Driver stacktrace: > > at org.apache.spark.scheduler.DAGScheduler.org > > > > > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) > > at > > > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) > > at > > > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) > > at > > > > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > > at > > > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) > > at > > > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > > at > > > > > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) > > at scala.Option.foreach(Option.scala:236) > > at > > > > > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) > > at > > > > > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > > at > > > > > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > > > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > > > > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch > 20) > > 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove > > executor 8 from BlockManagerMaster. > > 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8 successfully > > in removeExecutor > > 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block > manager > > ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM > > 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered > > executor: > > Actor[akka.tcp://[email protected] > > :56590/user/Executor#1456047585] > > with ID 9 > > > > On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <[email protected]> > wrote: > > > >> It’s not an error I’ve seen but they can tend to be pretty cryptic. > Could > >> you post more of the stack trace? > >> > >> On Sep 12, 2014, at 2:55 PM, Phil Wills <[email protected]> wrote: > >> > >> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on > >> that. I used the spark-ec2 scripts to set up the cluster. > >> > >> I might be able to share the data I'll mull it over the weekend to make > >> sure there's nothing sensitive, or if there's a way I can transform it > to > >> that point. > >> > >> Phil > >> > >> > >> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <[email protected]> > > wrote: > >> > >>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2 > >>> > >>> > >>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <[email protected]> > wrote: > >>> > >>> Is it a mature Spark cluster, what version of Spark? > >>> > >>> If you can share the data I can try it on mine. > >>> > >>> On Sep 12, 2014, at 9:42 AM, Phil Wills <[email protected]> wrote: > >>> > >>> I've been experimenting with the fairly new ItemSimilarityDriver, which > >> is > >>> working fine up until the point it tries to write out it's results. > >>> Initially I was getting an issue with the akka frameSize being too > > small, > >>> but after expanding that I'm now getting a much more cryptic error: > >>> > >>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run > >> saveAsTextFile > >>> at TextDelimitedReaderWriter.scala:288 > >>> Exception in thread "main" org.apache.spark.SparkException: Job aborted > >> due > >>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID > > 448 > >>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown > >>> reason > >>> > >>> This is from the master node, but there doesn't seem to be anything > more > >>> intelligible in the slave node logs. > >>> > >>> I've tried writing to the local file system as well as s3n and can see > >> it's > >>> not an access problem, as I am seeing a zero length file appear. > >>> > >>> Thanks for any pointers and apologies if this would be better to ask on > >> the > >>> Spark list, > >>> > >>> Phil > >>> > >>> > >>> > >> > >> > > > > > >
