org.apache.spark.SparkException: java.io.FileNotFoundException: does not exist)

2014-09-16 Thread Hui Li
Hi,

I am new to SPARK. I just set up a small cluster and wanted to run some
simple MLLIB examples. By following the instructions of
https://spark.apache.org/docs/0.9.0/mllib-guide.html#binary-classification-1,
I could successfully run everything until the step of SVMWithSGD, I got
error the following message. I don't know why the
 file:/root/test/sample_svm_data.txt does not exist since I already read it
out, printed it and converted into the labeled data and passed the parsed
data to the function SvmWithSGD.

Any one have the same issue with me?

Thanks,

Emily

 val model = SVMWithSGD.train(parsedData, numIterations)
14/09/16 10:55:21 INFO SparkContext: Starting job: first at
GeneralizedLinearAlgorithm.scala:121
14/09/16 10:55:21 INFO DAGScheduler: Got job 11 (first at
GeneralizedLinearAlgorithm.scala:121) with 1 output partitions
(allowLocal=true)
14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 11 (first at
GeneralizedLinearAlgorithm.scala:121)
14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List()
14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List()
14/09/16 10:55:21 INFO DAGScheduler: Computing the requested partition
locally
14/09/16 10:55:21 INFO HadoopRDD: Input split:
file:/root/test/sample_svm_data.txt:0+19737
14/09/16 10:55:21 INFO SparkContext: Job finished: first at
GeneralizedLinearAlgorithm.scala:121, took 0.002697478 s
14/09/16 10:55:21 INFO SparkContext: Starting job: count at
DataValidators.scala:37
14/09/16 10:55:21 INFO DAGScheduler: Got job 12 (count at
DataValidators.scala:37) with 2 output partitions (allowLocal=false)
14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 12 (count at
DataValidators.scala:37)
14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List()
14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List()
14/09/16 10:55:21 INFO DAGScheduler: Submitting Stage 12 (FilteredRDD[26]
at filter at DataValidators.scala:37), which has no missing parents
14/09/16 10:55:21 INFO DAGScheduler: Submitting 2 missing tasks from Stage
12 (FilteredRDD[26] at filter at DataValidators.scala:37)
14/09/16 10:55:21 INFO TaskSchedulerImpl: Adding task set 12.0 with 2 tasks
14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:0 as TID 24 on
executor 2: eecvm0206.demo.sas.com (PROCESS_LOCAL)
14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:0 as 1733 bytes
in 0 ms
14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:1 as TID 25 on
executor 5: eecvm0203.demo.sas.com (PROCESS_LOCAL)
14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:1 as 1733 bytes
in 0 ms
14/09/16 10:55:21 WARN TaskSetManager: Lost TID 24 (task 12.0:0)
14/09/16 10:55:21 WARN TaskSetManager: Loss was due to
java.io.FileNotFoundException
java.io.FileNotFoundException: File file:/root/test/sample_svm_data.txt
does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:402)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
at
org.apache.hadoop.mapred.LineRecordReader.init(LineRecordReader.java:108)
at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at
org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:156)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at

Re: org.apache.spark.SparkException: java.io.FileNotFoundException: does not exist)

2014-09-16 Thread Aris
This should be a really simple problem, but you haven't shared enough code
to determine what's going on here.

On Tue, Sep 16, 2014 at 8:08 AM, Hui Li littleleave...@gmail.com wrote:

 Hi,

 I am new to SPARK. I just set up a small cluster and wanted to run some
 simple MLLIB examples. By following the instructions of
 https://spark.apache.org/docs/0.9.0/mllib-guide.html#binary-classification-1,
 I could successfully run everything until the step of SVMWithSGD, I got
 error the following message. I don't know why the
  file:/root/test/sample_svm_data.txt does not exist since I already read it
 out, printed it and converted into the labeled data and passed the parsed
 data to the function SvmWithSGD.

 Any one have the same issue with me?

 Thanks,

 Emily

  val model = SVMWithSGD.train(parsedData, numIterations)
 14/09/16 10:55:21 INFO SparkContext: Starting job: first at
 GeneralizedLinearAlgorithm.scala:121
 14/09/16 10:55:21 INFO DAGScheduler: Got job 11 (first at
 GeneralizedLinearAlgorithm.scala:121) with 1 output partitions
 (allowLocal=true)
 14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 11 (first at
 GeneralizedLinearAlgorithm.scala:121)
 14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List()
 14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List()
 14/09/16 10:55:21 INFO DAGScheduler: Computing the requested partition
 locally
 14/09/16 10:55:21 INFO HadoopRDD: Input split:
 file:/root/test/sample_svm_data.txt:0+19737
 14/09/16 10:55:21 INFO SparkContext: Job finished: first at
 GeneralizedLinearAlgorithm.scala:121, took 0.002697478 s
 14/09/16 10:55:21 INFO SparkContext: Starting job: count at
 DataValidators.scala:37
 14/09/16 10:55:21 INFO DAGScheduler: Got job 12 (count at
 DataValidators.scala:37) with 2 output partitions (allowLocal=false)
 14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 12 (count at
 DataValidators.scala:37)
 14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List()
 14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List()
 14/09/16 10:55:21 INFO DAGScheduler: Submitting Stage 12 (FilteredRDD[26]
 at filter at DataValidators.scala:37), which has no missing parents
 14/09/16 10:55:21 INFO DAGScheduler: Submitting 2 missing tasks from Stage
 12 (FilteredRDD[26] at filter at DataValidators.scala:37)
 14/09/16 10:55:21 INFO TaskSchedulerImpl: Adding task set 12.0 with 2 tasks
 14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:0 as TID 24 on
 executor 2: eecvm0206.demo.sas.com (PROCESS_LOCAL)
 14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:0 as 1733
 bytes in 0 ms
 14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:1 as TID 25 on
 executor 5: eecvm0203.demo.sas.com (PROCESS_LOCAL)
 14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:1 as 1733
 bytes in 0 ms
 14/09/16 10:55:21 WARN TaskSetManager: Lost TID 24 (task 12.0:0)
 14/09/16 10:55:21 WARN TaskSetManager: Loss was due to
 java.io.FileNotFoundException
 java.io.FileNotFoundException: File file:/root/test/sample_svm_data.txt
 does not exist
 at
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
 at
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
 at
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:402)
 at
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
 at
 org.apache.hadoop.mapred.LineRecordReader.init(LineRecordReader.java:108)
 at
 org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
 at
 org.apache.spark.rdd.HadoopRDD$$anon$1.init(HadoopRDD.scala:156)
 at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
 at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
 at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
 at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
 at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
 at