Github user squito commented on the issue:
https://github.com/apache/spark/pull/19848
I have one concern about this -- there is a case where you are not giving a
unique id to the hadoop committers. You could save one rdd twice, and even
have both of those operations running concurrently. I suppose its weird enough
that we don't need to worry about it?
I don't think there are any problems w/ stage retry -- that only applies to
shuffle map stages, and the hadoop writer is only for result stages.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]