Hi, I am new to SPARK. I just set up a small cluster and wanted to run some simple MLLIB examples. By following the instructions of https://spark.apache.org/docs/0.9.0/mllib-guide.html#binary-classification-1, I could successfully run everything until the step of SVMWithSGD, I got error the following message. I don't know why the file:/root/test/sample_svm_data.txt does not exist since I already read it out, printed it and converted into the labeled data and passed the parsed data to the function SvmWithSGD.
Any one have the same issue with me? Thanks, Emily val model = SVMWithSGD.train(parsedData, numIterations) 14/09/16 10:55:21 INFO SparkContext: Starting job: first at GeneralizedLinearAlgorithm.scala:121 14/09/16 10:55:21 INFO DAGScheduler: Got job 11 (first at GeneralizedLinearAlgorithm.scala:121) with 1 output partitions (allowLocal=true) 14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 11 (first at GeneralizedLinearAlgorithm.scala:121) 14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List() 14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List() 14/09/16 10:55:21 INFO DAGScheduler: Computing the requested partition locally 14/09/16 10:55:21 INFO HadoopRDD: Input split: file:/root/test/sample_svm_data.txt:0+19737 14/09/16 10:55:21 INFO SparkContext: Job finished: first at GeneralizedLinearAlgorithm.scala:121, took 0.002697478 s 14/09/16 10:55:21 INFO SparkContext: Starting job: count at DataValidators.scala:37 14/09/16 10:55:21 INFO DAGScheduler: Got job 12 (count at DataValidators.scala:37) with 2 output partitions (allowLocal=false) 14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 12 (count at DataValidators.scala:37) 14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List() 14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List() 14/09/16 10:55:21 INFO DAGScheduler: Submitting Stage 12 (FilteredRDD[26] at filter at DataValidators.scala:37), which has no missing parents 14/09/16 10:55:21 INFO DAGScheduler: Submitting 2 missing tasks from Stage 12 (FilteredRDD[26] at filter at DataValidators.scala:37) 14/09/16 10:55:21 INFO TaskSchedulerImpl: Adding task set 12.0 with 2 tasks 14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:0 as TID 24 on executor 2: eecvm0206.demo.sas.com (PROCESS_LOCAL) 14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:0 as 1733 bytes in 0 ms 14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:1 as TID 25 on executor 5: eecvm0203.demo.sas.com (PROCESS_LOCAL) 14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:1 as 1733 bytes in 0 ms 14/09/16 10:55:21 WARN TaskSetManager: Lost TID 24 (task 12.0:0) 14/09/16 10:55:21 WARN TaskSetManager: Loss was due to java.io.FileNotFoundException java.io.FileNotFoundException: File file:/root/test/sample_svm_data.txt does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:402) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) *val* model *=* *SVMWithSGD*.train(parsedData, numIterations)