Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19269
People may know that I'm busy with some S3 committers which work with
Hadoop MapReduce & Spark, with an import of Ryan's commtter into the Hadoop
codebase. Thisa includes changes to s3a to support that and alternative design
which relies on a consistent s3, a spark committer (not in the spark codebase
itself) to handle it there, plust the tests & documentation of how things
actually commit work. For that, the conclusion I've reached is: nobody really
knows what's going on, and its a miracle things work at all.
FWIW, I think I now have what is the [closest thing to
documentation](https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md)
of what goes on at the Hadoop FS Committer layer, based on code, tests,
stepping through operations in an IDE, and other archaeology.
Based on that experience, I think a key deliverable ought to be some
specification of what a committer does. I know there are bits in the javadocs
like "Implementations should handle this case correctly", but without a
definition of "correct" its hard to point at an implementation and say "you've
got it wrong".
I would propose supplementing the code with
1. A rigorous specification, including possible workflows. scala & RFC2119
keywords, maybe.
1. Tests against that, including attempts to simulate the failure modes,
all the orders of execution which aren't expected to be valid, etc.
Some general questions related to this:
* If a writer doesn't support speculation, can it say so? I know
speculation and failure recovery are related, but task retry after failure is
linearized, whereas speculation can happen in parallel.
* Is it possible to instantiate a second writer and say "abort the output
of the other writer" (on a different host?). This allows for cleanup of a
task's work after the failure of the entire executor. If it's not possible,
then the job commit must be required to clean up. Maybe: pass to the job commit
information about failing tasks, so it has more of an idea what to do. (Hadoop
MapReduce example: AM calls abortTask() for all failing containers before
instantiating a new one and retrying)
* MAY the intermediate output of a task be observable to others?
* MAY the committed output of a task observable to others? If so, what does
this mean for readers? Is it something which a write may wish to declare/warn
callers?
* What if `DataWriter.commit()` just doesn't return/the executor fails
during that commit process? Is that a failure of the entire job vs task? (FWIW
MR algorithm 1 handles this, algorithm 2 doesn't).
* What if `writer.abort()` raises an exception ? (not unusual if cause of
commit failure is network/auth problem)
* What if `writer.abort()` is called before any other operation on that
writer? Better be a no-op.
* What if `DataSourceV2Writer.commit()` fails? Can it be retried? (Easiest
to say no, obviously).
* If, after a `DataSourceV2Writer.commit()` fails, can
`DataSourceV2Writer.abort()` be called?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]