Forgot to add that I tried your program with the same input file path. It worked fine. (I used local[2], however...)
Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe <http://typesafe.com> @deanwampler <http://twitter.com/deanwampler> http://polyglotprogramming.com On Thu, Jul 31, 2014 at 5:07 PM, Dean Wampler <[email protected]> wrote: > The stack trace suggests it was trying to create a temporary file, not > read your file. Of course, it doesn't say what file it couldn't create. > > Could there be a configuration file, like a Hadoop config file, that was > read with a temp dir setting that's invalid for your machine? > > dean > > Dean Wampler, Ph.D. > Author: Programming Scala, 2nd Edition > <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > > On Thu, Jul 31, 2014 at 4:04 PM, Ryan Tabora <[email protected]> wrote: > >> Hey all, >> >> I was able to spawn up a cluster, but when I'm trying to submit a simple >> jar via spark-submit it fails to run. I am trying to run the simple >> "Standalone Application" from the quickstart. >> >> Oddly enough, I could get another application running through the >> spark-shell. What am I doing wrong here? :( >> >> http://spark.apache.org/docs/latest/quick-start.html >> >> * Here's my setup: * >> >> $ ls >> project simple.sbt src target >> >> $ ls -R src >> src: >> main >> >> src/main: >> scala >> >> src/main/scala: >> SimpleApp.scala >> >> $ cat src/main/scala/SimpleApp.scala >> package main.scala >> >> /* SimpleApp.scala */ >> import org.apache.spark.SparkContext >> import org.apache.spark.SparkContext._ >> import org.apache.spark.SparkConf >> >> object SimpleApp { >> def main(args: Array[String]) { >> val logFile = "/tmp/README.md" >> val conf = new SparkConf().setAppName("Simple Application") >> val sc = new SparkContext(conf) >> val logData = sc.textFile(logFile, 2).cache() >> val numAs = logData.filter(line => >> line.contains("a")).count() >> val numBs = logData.filter(line => >> line.contains("b")).count() >> println("Lines with a: %s, Lines with >> b: %s".format(numAs, numBs)) >> } >> } >> >> $ cat simple.sbt >> name := "Simple Project" >> >> version := "1.0" >> >> scalaVersion := "2.10.4" >> >> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.0.1" >> >> resolvers += "Akka Repository" at "http://repo.akka.io/releases/" >> >> * Here's how I run the job: * >> >> $ /root/spark/bin/spark-submit --class "main.scala.SimpleApp" --master >> local[4] ./target/scala-2.10/simple-project_2.10-1.0.jar >> >> *Here is the error: * >> >> 14/07/31 16:23:56 INFO scheduler.DAGScheduler: Failed to run count at >> SimpleApp.scala:14 >> Exception in thread "main" org.apache.spark.SparkException: Job aborted >> due to stage failure: Task 0.0:1 failed 1 times, most recent failure: >> Exception failure in TID 1 on host localhost: java.io.IOException: No such >> file or directory >> java.io.UnixFileSystem.createFileExclusively(Native Method) >> java.io.File.createNewFile(File.java:1006) >> java.io.File.createTempFile(File.java:1989) >> org.apache.spark.util.Utils$.fetchFile(Utils.scala:326) >> >> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:332) >> >> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:330) >> >> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) >> >> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) >> >> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) >> >> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) >> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) >> scala.collection.mutable.HashMap.foreach(HashMap.scala:98) >> >> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) >> org.apache.spark.executor.Executor.org >> $apache$spark$executor$Executor$$updateDependencies(Executor.scala:330) >> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:168) >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> java.lang.Thread.run(Thread.java:745) >> Driver stacktrace: >> at org.apache.spark.scheduler.DAGScheduler.org >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >> at >> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) >> at >> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> at >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >> at >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> at >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> 14/07/31 16:23:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, >> whose tasks have all completed, from pool >> >> *Another note: * >> >> I do not believe the issue is because the input file is missing. It is >> most certainly there, and I got a different error when I tried pointing it >> to a non-existent file. >> >> $ /root/ephemeral-hdfs/bin/hadoop fs -lsr / >> Warning: $HADOOP_HOME is deprecated. >> >> drwxr-xr-x - root supergroup 0 2014-07-31 00:10 /tachyon >> drwxr-xr-x - root supergroup 0 2014-07-31 00:10 /tachyon/data >> drwxr-xr-x - root supergroup 0 2014-07-31 00:10 >> /tachyon/workers >> drwxr-xr-x - root supergroup 0 2014-07-31 01:01 /tmp >> -rw-r--r-- 3 root supergroup 281471 2014-07-31 00:17 >> /tmp/CHANGES.txt >> -rw-r--r-- 3 root supergroup 4221 2014-07-31 01:01 /tmp/README.md >> >> Regards, >> Ryan Tabora >> http://ryantabora.com >> > >
