Thanks Steve!
I will study about links you mentioned!
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
sorry, not noticed this followup. Been busy with other issues
On 3 Apr 2018, at 11:19, cane
mailto:zhoukang199...@gmail.com>> wrote:
Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may cause
data loss.
I check the comment of thi api:
We should make sure our tasks are idemp
I observe that.
If commit Job done on driver and commit task done on executor.
With speculation enable,it may cause data loss.
Since commit Job will call listStatus and commit Task will delete output
file if already exist and rename to final output.
When listStatus called after delete and before re
> On 3 Apr 2018, at 11:19, cane wrote:
>
> Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may cause
> data loss.
> I check the comment of thi api:
>
> We should make sure our tasks are idempotent when speculation is enabled,
> i.e. do
> * not use output committer that w
Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may cause
data loss.
I check the comment of thi api:
We should make sure our tasks are idempotent when speculation is enabled,
i.e. do
* not use output committer that writes data directly.
* There is an example in
https://