Re: streaming: missing data. does saveAsTextFile() append or replace?

2015-11-09 Thread Andy Davidson
:13 AM To: Andrew Davidson <a...@santacruzintegration.com> Cc: "user @spark" <user@spark.apache.org> Subject: Re: streaming: missing data. does saveAsTextFile() append or replace? > Andy, > > Using the rdd.saveAsTextFile(...) will overwrite the data if your target

Re: streaming: missing data. does saveAsTextFile() append or replace?

2015-11-08 Thread Gerard Maas
Andy, Using the rdd.saveAsTextFile(...) will overwrite the data if your target is the same file. If you want to save to HDFS, DStream offers dstream.saveAsTextFiles(prefix, suffix) where a new file will be written at each streaming interval. Note that this will result in a saved file for each

streaming: missing data. does saveAsTextFile() append or replace?

2015-11-07 Thread Andy Davidson
Hi I just started a new spark streaming project. In this phase of the system all we want to do is save the data we received to hdfs. I after running for a couple of days it looks like I am missing a lot of data. I wonder if saveAsTextFile("hdfs:///rawSteamingData²); is overwriting the data I