mridulm commented on pull request #35076:
URL: https://github.com/apache/spark/pull/35076#issuecomment-1008449953


   
   To respond to your queries @pan3793 (pls feel free to elaborate @otterc):
   
   > 
   > 1. On ESS side, there may be multiple streams request to write one shuffle 
partition, I saw some variables declared without transient, does netty ensure 
to handle them in the same thread?
   
   `transient` applies to serialization/deserialization - did you mean 
`volatile` in this context instead ?
   Having said that, all state is modified with the `AppShufflePartitionInfo` 
locked - so would be within that critical section.
   
   > 2. The ESS writes 3 files for a merged partition, `data`, `index`, `meta`, 
and maintains each committed file position in-memory variables. When data 
arrives, locks `partitionInfo`, and writes files ordered by `data`, `index`, 
`meta` from the committed position, if all writing success, update the 
committed file position, if any `IOException` occurs, the committed file 
position will keep previous values, then release the `partitionInfo` lock. 
Thus, the committed status should always be consistent. Finally, truncate files 
in committed positions before reporting merged status to `DAGScheduler`. So if 
ESS reported a merged status to `DAGScheduler`, the final files should always 
be consistent with each other and the merged status. And we can trust the 
committed data of file in anytime. Do I understand it correctly?
   
   Yes. In addition, any exceptions during write/etc would trigger a failure, 
and would reset back to previous 'good' state.
   
   > 3. For performance, ESS does not call `flush` of each file writing, if 
`write` does not throw IOE, ESS treats the writing is succeeded, and finally 
call `partition.closeAllFilesAndDeleteIfNeeded(false)` in 
`#finalizeShuffleMerge`, but `#closeAllFilesAndDeleteIfNeeded` will swallow any 
IOE which may cause the file inconsistent with the merged status?
   
   The `IOException` being thrown in that method is when we are unable to close 
the stream - in this case, it is a close of the fd.
   While possible in theory, usually it would point to other more severe issues 
outside of what spark can deal with.
   But you are right, it does log and ignore failures if close fails.
   
   > 4. Does `file.e.getChannel().truncate(file.getPos())` always success if no 
IOE throw? I saw it will return `null` in some conditions(NOT familiar with 
file system)
   
   The truncate is actually a best case effort to clean up excess disk space 
usage.
   If there is an ongoing write, and we are finalizing - the excess data from 
write is not relevant and wont be consumed - and so truncate.
   It also makes things more clear when debugging (the file sizes should match 
the metadata we know).
   
   > 5. A basic question about the OS file system. If process A writes and 
closes a file without any IOE, and gets the file length is `len`, does OS 
ensure another process B always reads the latest file content and gets the same 
`len`?
   
   Yes, unless there is some other interleaving modifications to that file (or 
some OS/fs/driver bugs, but I am discounting them for the time being !).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to