Clarification for WAL Compression doc

Andrey Elenskiy Tue, 14 Apr 2020 13:17:26 -0700

Hello,

I'm trying to understand the extent of the following issue mentioned in
"WAL Compression" doc: https://hbase.apache.org/book.html#wal.compression


A possible downside to WAL compression is that we lose more data from the
> last block in the WAL if it ill-terminated mid-write. If entries in this
> last block were added with new dictionary entries but we failed persist the
> amended dictionary because of an abrupt termination, a read of this last
> block may not be able to resolve last-written entries.


Does it mean there's a potential data loss even if the clients of
regionserver received an ack? First mention of this issue I noticed here:
https://issues.apache.org/jira/browse/HBASE-18504?focusedCommentId=16127767&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16127767
However, I couldn't find anything like that mentioned in the issue that
introduced the WAL compression (
https://issues.apache.org/jira/browse/HBASE-4608).

I've also poked around the code of how compression is done (
https://github.com/apache/hbase/blob/7877e09b6023c80e8bacd25fb8e0b9273ed7d258/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java#L171)
and not able to see how "failed to persist the amended dictionary" case can
happen. It seems like there's no explicit dictionary stored at all and
instead writing the data entries continuously records the dictionary on the
fly. If data is not in a dictionary it's written out explicitly so it
shouldn't be lost.

Could you please clarify the situation where data loss after receiving an
ack can happen when using wal compression?

Thanks,
Andrey

Clarification for WAL Compression doc

Reply via email to