erenavsarogullari opened a new pull request #34980:
URL: https://github.com/apache/spark/pull/34980


   ### What changes were proposed in this pull request?
   `java.io.IOException: Input/output error` usually points environmental 
issues such as disk read/write failures due to disk corruption, network access 
failures etc. This PR aims to be added clear message to catch this kind of 
environmental cases occurring on `BlockManager` and logs with `BlockManager 
hostname`, `blockId` and `blockPath`.
   
   ### Why are the changes needed?
   This kind of problems usually environmental problems and clear error message 
can help its analysis and save RCA time.
   
   Following stack-trace occurred on disk corruption:
   ```
   com.esotericsoftware.kryo.KryoException: java.io.IOException: Input/output 
error
   Serialization trace:
   buffers (org.apache.spark.sql.execution.columnar.DefaultCachedBatch)
       at com.esotericsoftware.kryo.io.Input.fill(Input.java:166)
       at com.esotericsoftware.kryo.io.Input.require(Input.java:196)
       at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:346)
       at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:326)
       at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:55)
       at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:38)
       at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:789)
       at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:381)
       at 
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
       at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:789)
       at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)
       at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
       at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:816)
       at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:296)
       at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:168)
       at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
       at 
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
       at 
org.apache.spark.storage.BlockManager.maybeCacheDiskValuesInMemory(BlockManager.scala:1569)
       at 
org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:877)
       at org.apache.spark.storage.BlockManager.get(BlockManager.scala:1163)
   ...
   Caused by: java.io.IOException: Input/output error
       at java.io.FileInputStream.readBytes(Native Method)
       at java.io.FileInputStream.read(FileInputStream.java:255)
       at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
       at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
       at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
       at 
net.jpountz.lz4.LZ4BlockInputStream.tryReadFully(LZ4BlockInputStream.java:269)
       at 
net.jpountz.lz4.LZ4BlockInputStream.readFully(LZ4BlockInputStream.java:280)
       at 
net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:243)
       at net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157)
       at com.esotericsoftware.kryo.io.Input.fill(Input.java:164)
       ... 87 more 
   ```
   
   Proposed Error Message:
   ```
   java.io.IOException: Input/output error usually occurs due to environmental 
problems 
   (e.g: disk corruption, network failure etc) so please check env status if 
healthy. 
   BlockManagerId(driver, localhost, 54937, None) - blockName: test_my-block-id 
- blockDiskPath: 
/private/var/folders/kj/mccyycwn6mjdwnglw9g3k6pm0000gq/T/blockmgr-e86d8f67-a993-407f-ad3b-3cfb667b4ad4/11/test_my-block-id
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added 2 new Unit Tests by reproducing the issue and logging new error 
message for getting data blocks from disk.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to