This should be a really simple problem, but you haven't shared enough code to determine what's going on here.
On Tue, Sep 16, 2014 at 8:08 AM, Hui Li <littleleave...@gmail.com> wrote: > Hi, > > I am new to SPARK. I just set up a small cluster and wanted to run some > simple MLLIB examples. By following the instructions of > https://spark.apache.org/docs/0.9.0/mllib-guide.html#binary-classification-1, > I could successfully run everything until the step of SVMWithSGD, I got > error the following message. I don't know why the > file:/root/test/sample_svm_data.txt does not exist since I already read it > out, printed it and converted into the labeled data and passed the parsed > data to the function SvmWithSGD. > > Any one have the same issue with me? > > Thanks, > > Emily > > val model = SVMWithSGD.train(parsedData, numIterations) > 14/09/16 10:55:21 INFO SparkContext: Starting job: first at > GeneralizedLinearAlgorithm.scala:121 > 14/09/16 10:55:21 INFO DAGScheduler: Got job 11 (first at > GeneralizedLinearAlgorithm.scala:121) with 1 output partitions > (allowLocal=true) > 14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 11 (first at > GeneralizedLinearAlgorithm.scala:121) > 14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List() > 14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List() > 14/09/16 10:55:21 INFO DAGScheduler: Computing the requested partition > locally > 14/09/16 10:55:21 INFO HadoopRDD: Input split: > file:/root/test/sample_svm_data.txt:0+19737 > 14/09/16 10:55:21 INFO SparkContext: Job finished: first at > GeneralizedLinearAlgorithm.scala:121, took 0.002697478 s > 14/09/16 10:55:21 INFO SparkContext: Starting job: count at > DataValidators.scala:37 > 14/09/16 10:55:21 INFO DAGScheduler: Got job 12 (count at > DataValidators.scala:37) with 2 output partitions (allowLocal=false) > 14/09/16 10:55:21 INFO DAGScheduler: Final stage: Stage 12 (count at > DataValidators.scala:37) > 14/09/16 10:55:21 INFO DAGScheduler: Parents of final stage: List() > 14/09/16 10:55:21 INFO DAGScheduler: Missing parents: List() > 14/09/16 10:55:21 INFO DAGScheduler: Submitting Stage 12 (FilteredRDD[26] > at filter at DataValidators.scala:37), which has no missing parents > 14/09/16 10:55:21 INFO DAGScheduler: Submitting 2 missing tasks from Stage > 12 (FilteredRDD[26] at filter at DataValidators.scala:37) > 14/09/16 10:55:21 INFO TaskSchedulerImpl: Adding task set 12.0 with 2 tasks > 14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:0 as TID 24 on > executor 2: eecvm0206.demo.sas.com (PROCESS_LOCAL) > 14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:0 as 1733 > bytes in 0 ms > 14/09/16 10:55:21 INFO TaskSetManager: Starting task 12.0:1 as TID 25 on > executor 5: eecvm0203.demo.sas.com (PROCESS_LOCAL) > 14/09/16 10:55:21 INFO TaskSetManager: Serialized task 12.0:1 as 1733 > bytes in 0 ms > 14/09/16 10:55:21 WARN TaskSetManager: Lost TID 24 (task 12.0:0) > 14/09/16 10:55:21 WARN TaskSetManager: Loss was due to > java.io.FileNotFoundException > java.io.FileNotFoundException: File file:/root/test/sample_svm_data.txt > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:402) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) > at > org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108) > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) > at > org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) > at org.apache.spark.scheduler.Task.run(Task.scala:53) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > > *val* model *=* *SVMWithSGD*.train(parsedData, numIterations) > >