gaoyajun02 commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1298452168
We have now located the cause of zero-size chunk loss, We have located the cause of the zero-size chunk problem on the shuffle service node. and there is the following information in the system `dmesg -T`: ``` e Nov 1 19:40:04 2022] EXT4-fs (sde1): Delayed block allocation failed for inode 25755946 at logical offset 0 with max blocks 15 with error 117 [Tue Nov 1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed for inode 23266116 at logical offset 0 with max blocks 15 with error 117 [Tue Nov 1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed for inode 23266116 at logical offset 0 with max blocks 15 with error 117 ``` Although this is not from the software layer, and the number of bad nodes that lose data is very low, I think it makes sense to support fallback here. cc @otterc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org