actual on disk size without header is negative.

Minwoo Kang Sun, 27 Jun 2021 22:03:01 -0700

Hello,

I met a strange issue. However, I don't understand why it occurred.
That is "On-disk size without header provided is 65347, but block header
contains -620432417. Block offset: 117193180315, data starts with:".


Call stack:
  at
org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:521)
  at
org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:88)
  at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1671)
  at
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1538)
  at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:452)

The HBase version is 1.2.6
I know 1.2.6 reached EOL.
(However, it is too hard migration to the new cluster. That the reason why
I operate this cluster.)

It looks like a block is wrong. When it occurred, Every read request that
related to the error block didn't complete.
I don't know how to resolve this. The only way to resolve this issue is
major compaction.
After major compaction (means a block that looks wrong becomes an invalid
block), a Read request works fine.

I found an issue https://issues.apache.org/jira/browse/HBASE-20761.
I am not sure it is related.
But HBASE-20761 mentioned "This will cause further attempts to read the
block to fail since we will still retry the corrupt replica instead of
reporting the corrupt replica and trying a different one.", this looks like
a key to solve this issue.

Our cluster configuration
  - hbase.regionserver.checksum.verify=true
  - dfs.client.read.shortcircuit.skip.checksum=false

Does anyone have a similar situation?

Thanks.

actual on disk size without header is negative.

Reply via email to