[GitHub] [spark] pan3793 edited a comment on pull request #35076: [SPARK-37793][CORE][SHUFFLE] Fallback to fetch original blocks when noLocalMergedBlockDataError

GitBox Sun, 09 Jan 2022 08:37:41 -0800


pan3793 edited a comment on pull request #35076:
URL: https://github.com/apache/spark/pull/35076#issuecomment-1008332334



   After reading and debugging the push-based shuffle code, I don't know if I 
understand it correctly, and have some questions, will appreciate it if you can 
give me some feedbacks @mridulm @otterc 
   
   1. On ESS side, there may be multiple streams request to write one shuffle 
partition, I saw some variables declared without transient, does netty ensure 
to handle them in the same thread?
   2. The ESS writes 3 files for a merged partition, `data`, `index`, `meta`, 
and maintains each committed file position in-memory variables. When data 
arrives, locks `partitionInfo`, and writes files ordered by `data`, `index`, 
`meta`, if all writing success, update the committed file position, if any 
`IOException` occurs, the committed file position will keep previous values, 
then release the `partitionInfo` lock. Thus, the committed status should always 
be consistent. Finally, truncate files in committed positions before reporting 
merged status to `DAGScheduler`. So if ESS reported a merged status to 
`DAGScheduler`, the final files should always be consistent with each other and 
the merged status. Do I understand it correctly?
   3. For performance, ESS does not call `flush` of each file writing, if 
`write` does not throw IOE, ESS treats the writing is succeeded, and finally 
call `partition.closeAllFilesAndDeleteIfNeeded(false)` in 
`#finalizeShuffleMerge`, but `#closeAllFilesAndDeleteIfNeeded` will swallow any 
IOE which may cause the file inconsistent with the merged status?
   4. Does `file.e.getChannel().truncate(file.getPos())` always success if no 
IOE throw? I saw it will throw `null` in some conditions(NOT familiar with file 
system)
   5. A basic question about the OS file system. If process A writes and closes 
a file without any IOE, and gets the file length is `len`, does OS ensure 
another process B always reads the latest file content and gets the same `len`? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pan3793 edited a comment on pull request #35076: [SPARK-37793][CORE][SHUFFLE] Fallback to fetch original blocks when noLocalMergedBlockDataError

Reply via email to