Re: how to access local file from Spark sc.textFile("file:///path to/myfile")
General note: The /root is a protected local directory, meaning that if your program spawns as a non-root user, it will never be able to access the file. On Sat, Dec 12, 2015 at 12:21 AM Zhan Zhang wrote: > As Sean mentioned, you cannot referring to the local file in your remote > machine (executors). One walk around is to copy the file to all machines > within same directory. > > Thanks. > > Zhan Zhang > > On Dec 11, 2015, at 10:26 AM, Lin, Hao wrote: > > of the master node > > >
Re: how to access local file from Spark sc.textFile("file:///path to/myfile")
As Sean mentioned, you cannot referring to the local file in your remote machine (executors). One walk around is to copy the file to all machines within same directory. Thanks. Zhan Zhang On Dec 11, 2015, at 10:26 AM, Lin, Hao mailto:hao@finra.org>> wrote: of the master node
RE: how to access local file from Spark sc.textFile("file:///path to/myfile")
I logged into master of my cluster and referenced the local file of the master node machine. And yes that file only resides on master node, not on any of the remote workers. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Friday, December 11, 2015 1:00 PM To: Lin, Hao Cc: user@spark.apache.org Subject: Re: how to access local file from Spark sc.textFile("file:///path to/myfile") Hm, are you referencing a local file from your remote workers? That won't work as the file only exists in one machine (I presume). On Fri, Dec 11, 2015 at 5:19 PM, Lin, Hao wrote: > Hi, > > > > I have problem accessing local file, with such example: > > > > sc.textFile("file:///root/2008.csv").count() > > > > with error: File file:/root/2008.csv does not exist. > > The file clearly exists since, since if I missed type the file name to > an non-existing one, it will show: > > > > Error: Input path does not exist > > > > Please help! > > > > The following is the error message: > > > > scala> sc.textFile("file:///root/2008.csv").count() > > 15/12/11 17:12:08 WARN TaskSetManager: Lost task 15.0 in stage 8.0 > (TID 498, > 10.162.167.24): java.io.FileNotFoundException: File > file:/root/2008.csv does not exist > > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLoc > alFileSystem.java:511) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawL > ocalFileSystem.java:724) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSyst > em.java:501) > > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.j > ava:397) > > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.( > ChecksumFileSystem.java:137) > > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:3 > 39) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) > > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java > :108) > > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputForm > at.java:67) > > at > org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:3 > 8) > > at > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:88) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > ava:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > java:615) > > at java.lang.Thread.run(Thread.java:745) > > > > 15/12/11 17:12:08 ERROR TaskSetManager: Task 9 in stage 8.0 failed 4 > times; aborting job > > org.apache.spark.SparkException: Job aborted due to stage failure: > Task 9 in stage 8.0 failed 4 times, most recent failure: Lost task 9.3 > in stage 8.0 (TID 547, 10.162.167.23): java.io.FileNotFoundException: > File file:/root/2008.csv does not exist > > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLoc > alFileSystem.java:511) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawL > ocalFileSystem.java:724) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSyst > em.java:501) > > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.j > ava:397) > > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.( > ChecksumFileSystem.java:137) > > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:3 > 39) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) > > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java > :108) > > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFo
RE: how to access local file from Spark sc.textFile("file:///path to/myfile")
Yes to your question. I have spun up a cluster, login to the master as a root user, run spark-shell, and reference the local file of the master machine. From: Vijay Gharge [mailto:vijay.gha...@gmail.com] Sent: Friday, December 11, 2015 12:50 PM To: Lin, Hao Cc: user@spark.apache.org Subject: Re: how to access local file from Spark sc.textFile("file:///path to/myfile") One more question. Are you also running spark commands using root user ? Meanwhile am trying to simulate this locally. On Friday 11 December 2015, Lin, Hao mailto:hao@finra.org>> wrote: Here you go, thanks. -rw-r--r-- 1 root root 658M Dec 9 2014 /root/2008.csv From: Vijay Gharge [mailto:vijay.gha...@gmail.com] Sent: Friday, December 11, 2015 12:31 PM To: Lin, Hao Cc: user@spark.apache.org Subject: Re: how to access local file from Spark sc.textFile("file:///path to/myfile") Can you provide output of "ls -lh /root/2008.csv" ? On Friday 11 December 2015, Lin, Hao > wrote: Hi, I have problem accessing local file, with such example: sc.textFile("file:///root/2008.csv").count() with error: File file:/root/2008.csv does not exist. The file clearly exists since, since if I missed type the file name to an non-existing one, it will show: Error: Input path does not exist Please help! The following is the error message: scala> sc.textFile("file:///root/2008.csv").count() 15/12/11 17:12:08 WARN TaskSetManager: Lost task 15.0 in stage 8.0 (TID 498, 10.162.167.24): java.io.FileNotFoundException: File file:/root/2008.csv does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/12/11 17:12:08 ERROR TaskSetManager: Task 9 in stage 8.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 8.0 failed 4 times, most recent failure: Lost task 9.3 in stage 8.0 (TID 547, 10.162.167.23): java.io.FileNotFoundException: File file:/root/2008.csv does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:
Re: how to access local file from Spark sc.textFile("file:///path to/myfile")
Please ignore typo. I meant root "permissions" Regards, Vijay Gharge On Fri, Dec 11, 2015 at 11:30 PM, Vijay Gharge wrote: > This issue is due to file permission issue. You need to execute spark > operations using root command only. > > > > Regards, > Vijay Gharge > > > > On Fri, Dec 11, 2015 at 11:20 PM, Vijay Gharge > wrote: > >> One more question. Are you also running spark commands using root user ? >> Meanwhile am trying to simulate this locally. >> >> >> On Friday 11 December 2015, Lin, Hao wrote: >> >>> Here you go, thanks. >>> >>> >>> >>> -rw-r--r-- 1 root root 658M Dec 9 2014 /root/2008.csv >>> >>> >>> >>> *From:* Vijay Gharge [mailto:vijay.gha...@gmail.com] >>> *Sent:* Friday, December 11, 2015 12:31 PM >>> *To:* Lin, Hao >>> *Cc:* user@spark.apache.org >>> *Subject:* Re: how to access local file from Spark >>> sc.textFile("file:///path to/myfile") >>> >>> >>> >>> Can you provide output of "ls -lh /root/2008.csv" ? >>> >>> On Friday 11 December 2015, Lin, Hao wrote: >>> >>> Hi, >>> >>> >>> >>> I have problem accessing local file, with such example: >>> >>> >>> >>> sc.textFile("file:///root/2008.csv").count() >>> >>> >>> >>> with error: File file:/root/2008.csv does not exist. >>> >>> The file clearly exists since, since if I missed type the file name to >>> an non-existing one, it will show: >>> >>> >>> >>> Error: Input path does not exist >>> >>> >>> >>> Please help! >>> >>> >>> >>> The following is the error message: >>> >>> >>> >>> scala> sc.textFile("file:///root/2008.csv").count() >>> >>> 15/12/11 17:12:08 WARN TaskSetManager: Lost task 15.0 in stage 8.0 (TID >>> 498, 10.162.167.24): java.io.FileNotFoundException: File >>> file:/root/2008.csv does not exist >>> >>> at >>> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) >>> >>> at >>> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) >>> >>> at >>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) >>> >>> at >>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) >>> >>> at >>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) >>> >>> at >>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) >>> >>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) >>> >>> at >>> org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) >>> >>> at >>> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) >>> >>> at >>> org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) >>> >>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) >>> >>> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) >>> >>> at >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>> >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>> >>> at >>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >>> >>> at >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >>> >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >>> >>> at >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >>> >>> at org.apache.spark.scheduler.Task.run(Task.scala:88) >>> >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> >>> at java.lang.Th
Re: how to access local file from Spark sc.textFile("file:///path to/myfile")
Hm, are you referencing a local file from your remote workers? That won't work as the file only exists in one machine (I presume). On Fri, Dec 11, 2015 at 5:19 PM, Lin, Hao wrote: > Hi, > > > > I have problem accessing local file, with such example: > > > > sc.textFile("file:///root/2008.csv").count() > > > > with error: File file:/root/2008.csv does not exist. > > The file clearly exists since, since if I missed type the file name to an > non-existing one, it will show: > > > > Error: Input path does not exist > > > > Please help! > > > > The following is the error message: > > > > scala> sc.textFile("file:///root/2008.csv").count() > > 15/12/11 17:12:08 WARN TaskSetManager: Lost task 15.0 in stage 8.0 (TID 498, > 10.162.167.24): java.io.FileNotFoundException: File file:/root/2008.csv does > not exist > > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) > > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) > > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) > > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) > > at > org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:88) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > > > 15/12/11 17:12:08 ERROR TaskSetManager: Task 9 in stage 8.0 failed 4 times; > aborting job > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in > stage 8.0 failed 4 times, most recent failure: Lost task 9.3 in stage 8.0 > (TID 547, 10.162.167.23): java.io.FileNotFoundException: File > file:/root/2008.csv does not exist > > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) > > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) > > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) > > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) > > at > org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:88) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadP
Re: how to access local file from Spark sc.textFile("file:///path to/myfile")
This issue is due to file permission issue. You need to execute spark operations using root command only. Regards, Vijay Gharge On Fri, Dec 11, 2015 at 11:20 PM, Vijay Gharge wrote: > One more question. Are you also running spark commands using root user ? > Meanwhile am trying to simulate this locally. > > > On Friday 11 December 2015, Lin, Hao wrote: > >> Here you go, thanks. >> >> >> >> -rw-r--r-- 1 root root 658M Dec 9 2014 /root/2008.csv >> >> >> >> *From:* Vijay Gharge [mailto:vijay.gha...@gmail.com] >> *Sent:* Friday, December 11, 2015 12:31 PM >> *To:* Lin, Hao >> *Cc:* user@spark.apache.org >> *Subject:* Re: how to access local file from Spark >> sc.textFile("file:///path to/myfile") >> >> >> >> Can you provide output of "ls -lh /root/2008.csv" ? >> >> On Friday 11 December 2015, Lin, Hao wrote: >> >> Hi, >> >> >> >> I have problem accessing local file, with such example: >> >> >> >> sc.textFile("file:///root/2008.csv").count() >> >> >> >> with error: File file:/root/2008.csv does not exist. >> >> The file clearly exists since, since if I missed type the file name to an >> non-existing one, it will show: >> >> >> >> Error: Input path does not exist >> >> >> >> Please help! >> >> >> >> The following is the error message: >> >> >> >> scala> sc.textFile("file:///root/2008.csv").count() >> >> 15/12/11 17:12:08 WARN TaskSetManager: Lost task 15.0 in stage 8.0 (TID >> 498, 10.162.167.24): java.io.FileNotFoundException: File >> file:/root/2008.csv does not exist >> >> at >> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) >> >> at >> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) >> >> at >> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) >> >> at >> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) >> >> at >> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) >> >> at >> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) >> >> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) >> >> at >> org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) >> >> at >> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) >> >> at >> org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) >> >> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) >> >> at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) >> >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >> >> at >> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >> >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) >> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) >> >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >> >> at org.apache.spark.scheduler.Task.run(Task.scala:88) >> >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) >> >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> 15/12/11 17:12:08 ERROR TaskSetManager: Task 9 in stage 8.0 failed 4 >> times; aborting job >> >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 >> in stage 8.0 failed 4 times, most recent failure: Lost task 9.3 in stage >> 8.0 (TID 547, 10.162.167.23): java.io.FileNotFoundException: File >> file:/root/2008.csv does not exist >> >> at >> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) >> >> at >> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.j
Re: how to access local file from Spark sc.textFile("file:///path to/myfile")
One more question. Are you also running spark commands using root user ? Meanwhile am trying to simulate this locally. On Friday 11 December 2015, Lin, Hao wrote: > Here you go, thanks. > > > > -rw-r--r-- 1 root root 658M Dec 9 2014 /root/2008.csv > > > > *From:* Vijay Gharge [mailto:vijay.gha...@gmail.com > ] > *Sent:* Friday, December 11, 2015 12:31 PM > *To:* Lin, Hao > *Cc:* user@spark.apache.org > > *Subject:* Re: how to access local file from Spark > sc.textFile("file:///path to/myfile") > > > > Can you provide output of "ls -lh /root/2008.csv" ? > > On Friday 11 December 2015, Lin, Hao > wrote: > > Hi, > > > > I have problem accessing local file, with such example: > > > > sc.textFile("file:///root/2008.csv").count() > > > > with error: File file:/root/2008.csv does not exist. > > The file clearly exists since, since if I missed type the file name to an > non-existing one, it will show: > > > > Error: Input path does not exist > > > > Please help! > > > > The following is the error message: > > > > scala> sc.textFile("file:///root/2008.csv").count() > > 15/12/11 17:12:08 WARN TaskSetManager: Lost task 15.0 in stage 8.0 (TID > 498, 10.162.167.24): java.io.FileNotFoundException: File > file:/root/2008.csv does not exist > > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) > > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) > > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) > > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) > > at > org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:88) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > > > 15/12/11 17:12:08 ERROR TaskSetManager: Task 9 in stage 8.0 failed 4 > times; aborting job > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 > in stage 8.0 failed 4 times, most recent failure: Lost task 9.3 in stage > 8.0 (TID 547, 10.162.167.23): java.io.FileNotFoundException: File > file:/root/2008.csv does not exist > > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) > > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) > > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) > > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) > > at >
RE: how to access local file from Spark sc.textFile("file:///path to/myfile")
Here you go, thanks. -rw-r--r-- 1 root root 658M Dec 9 2014 /root/2008.csv From: Vijay Gharge [mailto:vijay.gha...@gmail.com] Sent: Friday, December 11, 2015 12:31 PM To: Lin, Hao Cc: user@spark.apache.org Subject: Re: how to access local file from Spark sc.textFile("file:///path to/myfile") Can you provide output of "ls -lh /root/2008.csv" ? On Friday 11 December 2015, Lin, Hao mailto:hao@finra.org>> wrote: Hi, I have problem accessing local file, with such example: sc.textFile("file:///root/2008.csv").count() with error: File file:/root/2008.csv does not exist. The file clearly exists since, since if I missed type the file name to an non-existing one, it will show: Error: Input path does not exist Please help! The following is the error message: scala> sc.textFile("file:///root/2008.csv").count() 15/12/11 17:12:08 WARN TaskSetManager: Lost task 15.0 in stage 8.0 (TID 498, 10.162.167.24): java.io.FileNotFoundException: File file:/root/2008.csv does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/12/11 17:12:08 ERROR TaskSetManager: Task 9 in stage 8.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 8.0 failed 4 times, most recent failure: Lost task 9.3 in stage 8.0 (TID 547, 10.162.167.23): java.io.FileNotFoundException: File file:/root/2008.csv does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.Thre
Re: how to access local file from Spark sc.textFile("file:///path to/myfile")
Can you provide output of "ls -lh /root/2008.csv" ? On Friday 11 December 2015, Lin, Hao wrote: > Hi, > > > > I have problem accessing local file, with such example: > > > > sc.textFile("file:///root/2008.csv").count() > > > > with error: File file:/root/2008.csv does not exist. > > The file clearly exists since, since if I missed type the file name to an > non-existing one, it will show: > > > > Error: Input path does not exist > > > > Please help! > > > > The following is the error message: > > > > scala> sc.textFile("file:///root/2008.csv").count() > > 15/12/11 17:12:08 WARN TaskSetManager: Lost task 15.0 in stage 8.0 (TID > 498, 10.162.167.24): java.io.FileNotFoundException: File > file:/root/2008.csv does not exist > > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) > > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) > > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) > > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) > > at > org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:88) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > > > 15/12/11 17:12:08 ERROR TaskSetManager: Task 9 in stage 8.0 failed 4 > times; aborting job > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 > in stage 8.0 failed 4 times, most recent failure: Lost task 9.3 in stage > 8.0 (TID 547, 10.162.167.23): java.io.FileNotFoundException: File > file:/root/2008.csv does not exist > > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) > > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) > > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) > > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) > > at > org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > >at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:88) > > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread
how to access local file from Spark sc.textFile("file:///path to/myfile")
Hi, I have problem accessing local file, with such example: sc.textFile("file:///root/2008.csv").count() with error: File file:/root/2008.csv does not exist. The file clearly exists since, since if I missed type the file name to an non-existing one, it will show: Error: Input path does not exist Please help! The following is the error message: scala> sc.textFile("file:///root/2008.csv").count() 15/12/11 17:12:08 WARN TaskSetManager: Lost task 15.0 in stage 8.0 (TID 498, 10.162.167.24): java.io.FileNotFoundException: File file:/root/2008.csv does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/12/11 17:12:08 ERROR TaskSetManager: Task 9 in stage 8.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 8.0 failed 4 times, most recent failure: Lost task 9.3 in stage 8.0 (TID 547, 10.162.167.23): java.io.FileNotFoundException: File file:/root/2008.csv does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:12