gaoyajun02 commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1298452168

   We have now located the cause of zero-size chunk loss, 
   We have located the cause of the zero-size chunk problem on the shuffle 
service node. and there is the following information in the system `dmesg -T`:
   ```
   e Nov  1 19:40:04 2022] EXT4-fs (sde1): Delayed block allocation failed for 
inode 25755946 at logical offset 0 with max blocks 15 with error 117
   [Tue Nov  1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed 
for inode 23266116 at logical offset 0 with max blocks 15 with error 117
   [Tue Nov  1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed 
for inode 23266116 at logical offset 0 with max blocks 15 with error 117
   ```
   Although this is not from the software layer, and the number of bad nodes 
that lose data is very low, I think it makes sense to support fallback here.
   
   cc  @otterc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to