[ https://issues.apache.org/jira/browse/HBASE-28343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814521#comment-17814521 ]
Andrew Kyle Purtell commented on HBASE-28343: --------------------------------------------- We write the compression algorithm ordinal into the trailer, that used to be sufficient, but then I added these new codecs where some of the implementation options had limitations, where one flavor might not be compatible with another -- especially Zstandard! -- although it was assumed that an operator never changes codec configuration once data is live in the cluster because a codec option is always compatible with itself of course. bq. I think this problem could be solved by writing the classname of the codec used into the hfile. This could be used as a hint so that a regionserver can read hfiles compressed with any compression codec that it supports. +1, makes sense to me. Adds some safety and improves handling when codec implementations of a various algorithm may have been mixed. Although that should not be recommended practice. There is also HBASE-27706. The idea there is to implement a Hadoop codec compatible HBase side codec using zstd-jni, which I think is possible. > Write codec class into hfile header/trailer > ------------------------------------------- > > Key: HBASE-28343 > URL: https://issues.apache.org/jira/browse/HBASE-28343 > Project: HBase > Issue Type: Improvement > Reporter: Bryan Beaudreault > Priority: Major > > We recently started playing around with the new bundled compression libraries > as of 2.5.0. Specifically, we are experimenting with the different zstd > codecs. The book says that aircompressor's zstd is not data compatible with > hadoops, but doesn't say the same about zstd-jni. > In our experiments we ended up in a state where some hfiles were encoded with > zstd-jni (zstd.ZstdCodec) while others were encoded with hadoop > (ZStandardCodec). At this point the cluster became extremely unstable, with > some files unable to be read because they encoded with a codec that didn't > match the current runtime configration. Changing the runtime configuration > caused the other files to not be readable. > I think this problem could be solved by writing the classname of the codec > used into the hfile. This could be used as a hint so that a regionserver can > read hfiles compressed with any compression codec that it supports. > [~apurtell] do you have any thoughts here since you brought us all of these > great compression options? -- This message was sent by Atlassian Jira (v8.20.10#820010)