Re: wholeTextFiles not working with HDFS

2014-08-22 Thread pierred
I had the same issue with spark-1.0.2-bin-hadoop*1*, and indeed the issue
seems related to Hadoop1.  When switching to using
spark-1.0.2-bin-hadoop*2*, the issue disappears.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p12677.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: wholeTextFiles not working with HDFS

2014-07-23 Thread kmader
I have the same issue

val a = sc.textFile(s3n://MyBucket/MyFolder/*.tif)
a.first

works perfectly fine, but 

val d = sc.wholeTextFiles(s3n://MyBucket/MyFolder/*.tif)  does not
work
d.first

Gives the following error message

java.io.FileNotFoundException: File /MyBucket/MyFolder.tif does not
exist.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p10505.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: wholeTextFiles not working with HDFS

2014-07-23 Thread kmader
That worked for me as well, I was using spark 1.0 compiled against Hadoop
1.0, switching to 1.0.1 compiled against hadoop 2



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p10547.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: wholeTextFiles not working with HDFS

2014-06-17 Thread Sguj
I didn't fix the issue so much as work around it. I was running my cluster
locally, so using HDFS was just a preference. The code worked with the local
file system, so that's what I'm using until I can get some help.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7726.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: wholeTextFiles not working with HDFS

2014-06-17 Thread Xusen Yin
Hi Sguj and littlebird,

I'll try to fix it tomorrow evening and the day after tomorrow, because I
am now busy preparing a talk (slides) tomorrow. Sorry for the inconvenience
to you. Would you mind to write an issue on Spark JIRA?


2014-06-17 20:55 GMT+08:00 Sguj tpcome...@yahoo.com:

 I didn't fix the issue so much as work around it. I was running my cluster
 locally, so using HDFS was just a preference. The code worked with the
 local
 file system, so that's what I'm using until I can get some help.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7726.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.




-- 
Best Regards
---
Xusen Yin(尹绪森)
Intel Labs China
Homepage: *http://yinxusen.github.io/ http://yinxusen.github.io/*


Re: wholeTextFiles not working with HDFS

2014-06-17 Thread Sguj
I can write one if you'll point me to where I need to write it.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7737.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: wholeTextFiles not working with HDFS

2014-06-16 Thread littlebird
Hi, I have the same exception. Can you tell me how did you fix it? Thank you!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7665.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: wholeTextFiles not working with HDFS

2014-06-13 Thread Sguj
My exception stack looks about the same.

java.io.FileNotFoundException: File /user/me/target/capacity-scheduler.xml
does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.init(CombineFileInputFormat.java:489)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:280)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:240)
at
org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:173)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1094)
at org.apache.spark.rdd.RDD.collect(RDD.scala:717)

I'm using Hadoop 1.2.1, and everything else I've tried in Spark with that
version has worked, so I doubt it's a version error.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: wholeTextFiles not working with HDFS

2014-06-12 Thread yinxusen
Hi Sguj,

Could you give me the exception stack?

I test it on my laptop and find that it gets the wrong FileSystem. It should
be DistributedFileSystem, but it finds the RawLocalFileSystem.

If we get the same exception stack, I'll try to fix it.

Here is my exception stack:

java.io.FileNotFoundException: File /sen/reuters-out/reut2-000.sgm-0.txt
does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.init(CombineFileInputFormat.java:489)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:280)
at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:240)
at
org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:173)
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:201)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
at org.apache.spark.rdd.RDD.collect(RDD.scala:728)

Besides, what's your hadoop version?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7548.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.