Thanks Patrick for your detailed explanation.

BR
Jerry

-----Original Message-----
From: Patrick Wendell [mailto:pwend...@gmail.com] 
Sent: Thursday, December 25, 2014 3:43 PM
To: Cheng, Hao
Cc: Shao, Saisai; user@spark.apache.org; d...@spark.apache.org
Subject: Re: Question on saveAsTextFile with overwrite option

So the behavior of overwriting existing directories IMO is something we don't 
want to encourage. The reason why the Hadoop client has these checks is that 
it's very easy for users to do unsafe things without them. For instance, a user 
could overwrite an RDD that had 100 partitions with an RDD that has 10 
partitions... and if they read back the RDD they would get a corrupted RDD that 
has a combination of data from the old and new RDD.

If users want to circumvent these safety checks, we need to make them 
explicitly disable them. Given this, I think a config option is as reasonable 
as any alternatives. This is already pretty easy IMO.

- Patrick

On Wed, Dec 24, 2014 at 11:28 PM, Cheng, Hao <hao.ch...@intel.com> wrote:
> I am wondering if we can provide more friendly API, other than configuration 
> for this purpose. What do you think Patrick?
>
> Cheng Hao
>
> -----Original Message-----
> From: Patrick Wendell [mailto:pwend...@gmail.com]
> Sent: Thursday, December 25, 2014 3:22 PM
> To: Shao, Saisai
> Cc: user@spark.apache.org; d...@spark.apache.org
> Subject: Re: Question on saveAsTextFile with overwrite option
>
> Is it sufficient to set "spark.hadoop.validateOutputSpecs" to false?
>
> http://spark.apache.org/docs/latest/configuration.html
>
> - Patrick
>
> On Wed, Dec 24, 2014 at 10:52 PM, Shao, Saisai <saisai.s...@intel.com> wrote:
>> Hi,
>>
>>
>>
>> We have such requirements to save RDD output to HDFS with 
>> saveAsTextFile like API, but need to overwrite the data if existed.
>> I'm not sure if current Spark support such kind of operations, or I need to 
>> check this manually?
>>
>>
>>
>> There's a thread in mailing list discussed about this 
>> (http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-S
>> p ark-1-0-saveAsTextFile-to-overwrite-existing-file-td6696.html),
>> I'm not sure this feature is enabled or not, or with some configurations?
>>
>>
>>
>> Appreciate your suggestions.
>>
>>
>>
>> Thanks a lot
>>
>> Jerry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For 
> additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to