pan3793 edited a comment on pull request #35076: URL: https://github.com/apache/spark/pull/35076#issuecomment-1008332334
After reading and debugging the push-based shuffle code, I don't know if I understand it correctly, and have some questions, will appreciate it if you can give me some feedbacks @mridulm @otterc 1. On ESS side, there may be multiple streams request to write one shuffle partition, I saw some variables declared without transient, does netty ensure to handle them in the same thread? 2. The ESS writes 3 files for a merged partition, `data`, `index`, `meta`, and maintains each committed file position in-memory variables. When data arrives, locks `partitionInfo`, and writes files ordered by `data`, `index`, `meta`, if all writing success, update the committed file position, if any `IOException` occurs, the committed file position will keep previous values, then release the `partitionInfo` lock. Thus, the committed status should always be consistent. Finally, truncate files in committed positions before reporting merged status to `DAGScheduler`. So if ESS reported a merged status to `DAGScheduler`, the final files should always be consistent with each other and the merged status. Do I understand it correctly? 3. For performance, ESS does not call `flush` of each file writing, if `write` does not throw IOE, ESS treats the writing is succeeded, and finally call `partition.closeAllFilesAndDeleteIfNeeded(false)` in `#finalizeShuffleMerge`, but `#closeAllFilesAndDeleteIfNeeded` will swallow any IOE which may cause the file inconsistent with the merged status? 4. Does `file.e.getChannel().truncate(file.getPos())` always success if no IOE throw? I saw it will return `null` in some conditions(NOT familiar with file system) 5. A basic question about the OS file system. If process A writes and closes a file without any IOE, and gets the file length is `len`, does OS ensure another process B always reads the latest file content and gets the same `len`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
