rdd.saveAsTextFile problem

2014-01-02 Thread Philip Ogren
I have a very simple Spark application that looks like the following: var myRdd: RDD[Array[String]] = initMyRdd() println(myRdd.first.mkString(, )) println(myRdd.count) myRdd.saveAsTextFile(hdfs://myserver:8020/mydir) myRdd.saveAsTextFile(target/mydir/) The println statements work as

Re: rdd.saveAsTextFile problem

2014-01-02 Thread Andrew Ash
You want to write it to a local file on the machine? Try using file:///path/to/target/mydir/ instead I'm not sure what behavior would be if you did this on a multi-machine cluster though -- you may get a bit of data on each machine in that local directory. On Thu, Jan 2, 2014 at 12:22 PM,

Re: rdd.saveAsTextFile problem

2014-01-02 Thread Philip Ogren
Not really. In practice I write everything out to HDFS and that is working fine. But I write lots of unit tests and example scripts and it is convenient to be able to test a Spark application (or sequence of spark functions) in a very local way such that it doesn't depend on any outside

Re: rdd.saveAsTextFile problem

2014-01-02 Thread Philip Ogren
I just tried your suggestion and get the same results with the _temporary directory. Thanks though. On 1/2/2014 10:28 AM, Andrew Ash wrote: You want to write it to a local file on the machine? Try using file:///path/to/target/mydir/ instead I'm not sure what behavior would be if you did

Re: rdd.saveAsTextFile problem

2014-01-02 Thread Andrew Ash
For testing, maybe try using .collect and doing the comparison between expected and actual in memory rather than on disk? On Thu, Jan 2, 2014 at 12:54 PM, Philip Ogren philip.og...@oracle.comwrote: I just tried your suggestion and get the same results with the _temporary directory. Thanks

Re: rdd.saveAsTextFile problem

2014-01-02 Thread Philip Ogren
Yep - that works great and is what I normally do. I perhaps should have framed my email as a bug report. The documentation for saveAsTextFile says you can write results out to a local file but it doesn't work for me per the described behavior. It also worked before and now it doesn't. So,

Re: rdd.saveAsTextFile problem

2014-01-02 Thread Andrew Ash
I'm guessing it's a documentation issue, but certainly something could have broken. - what version of Spark? -- 0.8.1 - what mode are you running with? (local, standalone, mesos, YARN) -- local on Windows - are you using the shell or a application - shell? - what language (scala / java / Python)