Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/19269
  
    
    People may know that I'm busy with some S3 committers which work with 
Hadoop MapReduce & Spark, with an import of Ryan's commtter into the Hadoop 
codebase. Thisa includes changes to s3a to support that and alternative design 
which relies on a consistent s3, a spark committer (not in the spark codebase 
itself) to handle it there, plust the tests & documentation of how things 
actually commit work. For that, the conclusion I've reached is: nobody really 
knows what's going on, and its a miracle things work at all.
    
    FWIW, I think I now have what is the [closest thing to 
documentation](https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md)
 of what goes on at the Hadoop FS Committer layer, based on code, tests, 
stepping through operations in an IDE, and other archaeology.
    
    Based on that experience, I think a key deliverable ought to be some 
specification of what a committer does. I know there are bits in the javadocs 
like "Implementations should handle this case correctly", but without a 
definition of "correct" its hard to point at an implementation and say "you've 
got it wrong".
    
    I would propose supplementing the code with 
    
    1. A rigorous specification, including possible workflows. scala & RFC2119 
keywords, maybe.
    1. Tests against that, including attempts to simulate the failure modes, 
all the orders of execution which aren't expected to be valid, etc.
    
    Some general questions related to this:
    
    * If a writer doesn't support speculation, can it say so? I know 
speculation and failure recovery are related, but task retry after failure is 
linearized, whereas speculation can happen in parallel. 
    * Is it possible to instantiate a second writer and say "abort the output 
of the other writer" (on a different host?). This allows for cleanup of a 
task's work after the failure of the entire executor. If it's not possible, 
then the job commit must be required to clean up. Maybe: pass to the job commit 
information about failing tasks, so it has more of an idea what to do. (Hadoop 
MapReduce example:  AM calls abortTask() for all failing containers before 
instantiating a new one and retrying)
    * MAY the intermediate output of a task be observable to others?
    * MAY the committed output of a task observable to others? If so, what does 
this mean for readers? Is it something which a write may wish to declare/warn 
callers?
    * What if `DataWriter.commit()` just doesn't return/the executor fails 
during that commit process? Is that a failure of the entire job vs task? (FWIW 
MR algorithm 1 handles this, algorithm 2 doesn't).
    * What if `writer.abort()` raises an exception ? (not unusual if cause of 
commit failure is network/auth problem)
    * What if `writer.abort()` is called before any other operation on that 
writer? Better be a no-op.
    * What if `DataSourceV2Writer.commit()` fails? Can it be retried? (Easiest 
to say no, obviously). 
    * If, after a `DataSourceV2Writer.commit()` fails, can 
`DataSourceV2Writer.abort()` be called?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to