Fair enough. That rationale makes sense. I would prefer that a Spark clobber option also delete the destination files, but as long as it's a non-default option I can see the "caller beware" side of that argument as well.
Nick 2014년 6월 2일 월요일, Sean Owen<so...@cloudera.com>님이 작성한 메시지: > I assume the idea is for Spark to "rm -r dir/", which would clean out > everything that was there before. It's just doing this instead of the > caller. Hadoop still won't let you write into a location that already > exists regardless, and part of that is for this reason that you might > end up with files mixed-up from different jobs. > > This doesn't need a change to Hadoop and probably shouldn't; it's a > change to semantics provided by Spark to do the delete for you if you > set a flag. Viewed that way, meh, seems like the caller could just do > that themselves rather than expand the Spark API (via a utility method > if you like), but I can see it both ways. Caller beware. > > On Mon, Jun 2, 2014 at 10:08 PM, Nicholas Chammas > <nicholas.cham...@gmail.com <javascript:;>> wrote: > > OK, thanks for confirming. Is there something we can do about that > leftover > > part- files problem in Spark, or is that for the Hadoop team? > > > > > > 2014년 6월 2일 월요일, Aaron Davidson<ilike...@gmail.com <javascript:;>>님이 > 작성한 메시지: > > > >> Yes. > >> > >> > >> On Mon, Jun 2, 2014 at 1:23 PM, Nicholas Chammas > >> <nicholas.cham...@gmail.com <javascript:;>> wrote: > >> > >> So in summary: > >> > >> As of Spark 1.0.0, saveAsTextFile() will no longer clobber by default. > >> There is an open JIRA issue to add an option to allow clobbering. > >> Even when clobbering, part- files may be left over from previous saves, > >> which is dangerous. > >> > >> Is this correct? >