mridulm commented on pull request #35076: URL: https://github.com/apache/spark/pull/35076#issuecomment-1008449953
To respond to your queries @pan3793 (pls feel free to elaborate @otterc): > > 1. On ESS side, there may be multiple streams request to write one shuffle partition, I saw some variables declared without transient, does netty ensure to handle them in the same thread? `transient` applies to serialization/deserialization - did you mean `volatile` in this context instead ? Having said that, all state is modified with the `AppShufflePartitionInfo` locked - so would be within that critical section. > 2. The ESS writes 3 files for a merged partition, `data`, `index`, `meta`, and maintains each committed file position in-memory variables. When data arrives, locks `partitionInfo`, and writes files ordered by `data`, `index`, `meta` from the committed position, if all writing success, update the committed file position, if any `IOException` occurs, the committed file position will keep previous values, then release the `partitionInfo` lock. Thus, the committed status should always be consistent. Finally, truncate files in committed positions before reporting merged status to `DAGScheduler`. So if ESS reported a merged status to `DAGScheduler`, the final files should always be consistent with each other and the merged status. And we can trust the committed data of file in anytime. Do I understand it correctly? Yes. In addition, any exceptions during write/etc would trigger a failure, and would reset back to previous 'good' state. > 3. For performance, ESS does not call `flush` of each file writing, if `write` does not throw IOE, ESS treats the writing is succeeded, and finally call `partition.closeAllFilesAndDeleteIfNeeded(false)` in `#finalizeShuffleMerge`, but `#closeAllFilesAndDeleteIfNeeded` will swallow any IOE which may cause the file inconsistent with the merged status? The `IOException` being thrown in that method is when we are unable to close the stream - in this case, it is a close of the fd. While possible in theory, usually it would point to other more severe issues outside of what spark can deal with. But you are right, it does log and ignore failures if close fails. > 4. Does `file.e.getChannel().truncate(file.getPos())` always success if no IOE throw? I saw it will return `null` in some conditions(NOT familiar with file system) The truncate is actually a best case effort to clean up excess disk space usage. If there is an ongoing write, and we are finalizing - the excess data from write is not relevant and wont be consumed - and so truncate. It also makes things more clear when debugging (the file sizes should match the metadata we know). > 5. A basic question about the OS file system. If process A writes and closes a file without any IOE, and gets the file length is `len`, does OS ensure another process B always reads the latest file content and gets the same `len`? Yes, unless there is some other interleaving modifications to that file (or some OS/fs/driver bugs, but I am discounting them for the time being !). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
