[ 
https://issues.apache.org/jira/browse/HBASE-28343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814521#comment-17814521
 ] 

Andrew Kyle Purtell commented on HBASE-28343:
---------------------------------------------

We write the compression algorithm ordinal into the trailer, that used to be 
sufficient, but then I added these new codecs where some of the implementation 
options had limitations, where one flavor might not be compatible with another 
-- especially Zstandard! -- although it was assumed that an operator never 
changes codec configuration once data is live in the cluster because a codec 
option is always compatible with itself of course.

bq. I think this problem could be solved by writing the classname of the codec 
used into the hfile. This could be used as a hint so that a regionserver can 
read hfiles compressed with any compression codec that it supports.

+1, makes sense to me.
Adds some safety and improves handling when codec implementations of a various 
algorithm may have been mixed. Although that should not be recommended 
practice. 

There is also HBASE-27706. The idea there is to implement a Hadoop codec 
compatible HBase side codec using zstd-jni, which I think is possible. 

> Write codec class into hfile header/trailer
> -------------------------------------------
>
>                 Key: HBASE-28343
>                 URL: https://issues.apache.org/jira/browse/HBASE-28343
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> We recently started playing around with the new bundled compression libraries 
> as of 2.5.0. Specifically, we are experimenting with the different zstd 
> codecs. The book says that aircompressor's zstd is not data compatible with 
> hadoops, but doesn't say the same about zstd-jni.
> In our experiments we ended up in a state where some hfiles were encoded with 
> zstd-jni (zstd.ZstdCodec) while others were encoded with hadoop 
> (ZStandardCodec). At this point the cluster became extremely unstable, with 
> some files unable to be read because they encoded with a codec that didn't 
> match the current runtime configration. Changing the runtime configuration 
> caused the other files to not be readable.
> I think this problem could be solved by writing the classname of the codec 
> used into the hfile. This could be used as a hint so that a regionserver can 
> read hfiles compressed with any compression codec that it supports.
> [~apurtell] do you have any thoughts here since you brought us all of these 
> great compression options?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to