Re: wholeTextFiles not working with HDFS

yinxusen Thu, 12 Jun 2014 19:59:20 -0700

Hi Sguj,

Could you give me the exception stack?


I test it on my laptop and find that it gets the wrong FileSystem. It should
be DistributedFileSystem, but it finds the RawLocalFileSystem.

If we get the same exception stack, I'll try to fix it.

Here is my exception stack:

java.io.FileNotFoundException: File /sen/reuters-out/reut2-000.sgm-0.txt
does not exist.
        at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
        at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
        at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:489)
        at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:280)
        at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:240)
        at
org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:173)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:201)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:728)

Besides, what's your hadoop version?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7548.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: wholeTextFiles not working with HDFS

Reply via email to