Re: org.apache.spark.SparkException: java.io.FileNotFoundException: does not exist)

Aris Tue, 16 Sep 2014 15:40:19 -0700

This should be a really simple problem, but you haven't shared enough code
to determine what's going on here.


On Tue, Sep 16, 2014 at 8:08 AM, Hui Li <littleleave...@gmail.com> wrote:

> Hi,
>
> I am new to SPARK. I just set up a small cluster and wanted to run some
> simple MLLIB examples. By following the instructions of
> https://spark.apache.org/docs/0.9.0/mllib-guide.html#binary-classification-1,
> I could successfully run everything until the step of SVMWithSGD, I got
> error the following message. I don't know why the
>  file:/root/test/sample_svm_data.txt does not exist since I already read it
> out, printed it and converted into the labeled data and passed the parsed
> data to the function SvmWithSGD.
>
> Any one have the same issue with me?
>
> Thanks,
>
> Emily
>
>  val model = SVMWithSGD.train(parsedData, numIterations)
> 14/09/16 10:55:21 INFO SparkContext: Starting job: first at
> GeneralizedLinearAlgorithm.scala:121
> 14/09/16 10:55:21 INFO DAGScheduler: Got job 11 (first at
> GeneralizedLinearAlgorithm.scala:121) with 1 output partitions
> (allowLocal=true)
> 14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 11 (first at
> GeneralizedLinearAlgorithm.scala:121)
> 14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List()
> 14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List()
> 14/09/16 10:55:21 INFO DAGScheduler: Computing the requested partition
> locally
> 14/09/16 10:55:21 INFO HadoopRDD: Input split:
> file:/root/test/sample_svm_data.txt:0+19737
> 14/09/16 10:55:21 INFO SparkContext: Job finished: first at
> GeneralizedLinearAlgorithm.scala:121, took 0.002697478 s
> 14/09/16 10:55:21 INFO SparkContext: Starting job: count at
> DataValidators.scala:37
> 14/09/16 10:55:21 INFO DAGScheduler: Got job 12 (count at
> DataValidators.scala:37) with 2 output partitions (allowLocal=false)
> 14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 12 (count at
> DataValidators.scala:37)
> 14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List()
> 14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List()
> 14/09/16 10:55:21 INFO DAGScheduler: Submitting Stage 12 (FilteredRDD[26]
> at filter at DataValidators.scala:37), which has no missing parents
> 14/09/16 10:55:21 INFO DAGScheduler: Submitting 2 missing tasks from Stage
> 12 (FilteredRDD[26] at filter at DataValidators.scala:37)
> 14/09/16 10:55:21 INFO TaskSchedulerImpl: Adding task set 12.0 with 2 tasks
> 14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:0 as TID 24 on
> executor 2: eecvm0206.demo.sas.com (PROCESS_LOCAL)
> 14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:0 as 1733
> bytes in 0 ms
> 14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:1 as TID 25 on
> executor 5: eecvm0203.demo.sas.com (PROCESS_LOCAL)
> 14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:1 as 1733
> bytes in 0 ms
> 14/09/16 10:55:21 WARN TaskSetManager: Lost TID 24 (task 12.0:0)
> 14/09/16 10:55:21 WARN TaskSetManager: Loss was due to
> java.io.FileNotFoundException
> java.io.FileNotFoundException: File file:/root/test/sample_svm_data.txt
> does not exist
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
>         at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:402)
>         at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137)
>         at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
>         at
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
>         at
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
>         at
> org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
>         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
>         at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>         at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>         at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>         at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>         at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
>
>
> *val* model *=* *SVMWithSGD*.train(parsedData, numIterations)
>
>

Re: org.apache.spark.SparkException: java.io.FileNotFoundException: does not exist)

Reply via email to