I have a very simple Spark application that looks like the following:
var myRdd: RDD[Array[String]] = initMyRdd()
println(myRdd.first.mkString(, ))
println(myRdd.count)
myRdd.saveAsTextFile(hdfs://myserver:8020/mydir)
myRdd.saveAsTextFile(target/mydir/)
The println statements work as
You want to write it to a local file on the machine? Try using
file:///path/to/target/mydir/ instead
I'm not sure what behavior would be if you did this on a multi-machine
cluster though -- you may get a bit of data on each machine in that local
directory.
On Thu, Jan 2, 2014 at 12:22 PM,
Not really. In practice I write everything out to HDFS and that is
working fine. But I write lots of unit tests and example scripts and it
is convenient to be able to test a Spark application (or sequence of
spark functions) in a very local way such that it doesn't depend on any
outside
I just tried your suggestion and get the same results with the
_temporary directory. Thanks though.
On 1/2/2014 10:28 AM, Andrew Ash wrote:
You want to write it to a local file on the machine? Try using
file:///path/to/target/mydir/ instead
I'm not sure what behavior would be if you did
For testing, maybe try using .collect and doing the comparison between
expected and actual in memory rather than on disk?
On Thu, Jan 2, 2014 at 12:54 PM, Philip Ogren philip.og...@oracle.comwrote:
I just tried your suggestion and get the same results with the _temporary
directory. Thanks
Yep - that works great and is what I normally do.
I perhaps should have framed my email as a bug report. The
documentation for saveAsTextFile says you can write results out to a
local file but it doesn't work for me per the described behavior. It
also worked before and now it doesn't. So,
I'm guessing it's a documentation issue, but certainly something could have
broken.
- what version of Spark? -- 0.8.1
- what mode are you running with? (local, standalone, mesos, YARN) -- local
on Windows
- are you using the shell or a application - shell?
- what language (scala / java / Python)