[jira] [Resolved] (HBASE-27637) Zero length value would cause value compressor read nothing and not advance the position of the InputStream
[ https://issues.apache.org/jira/browse/HBASE-27637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-27637. --- Fix Version/s: 2.6.0 3.0.0-alpha-4 2.5.4 Hadoop Flags: Reviewed Resolution: Fixed Pushed to branch-2.5+. Thanks all for helping and reviewing! > Zero length value would cause value compressor read nothing and not advance > the position of the InputStream > --- > > Key: HBASE-27637 > URL: https://issues.apache.org/jira/browse/HBASE-27637 > Project: HBase > Issue Type: Bug > Components: dataloss, wal >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4 > > > This is a code sniff from the discussion of HBASE-27073 > {code} > public static void main(String[] args) throws Exception { > CompressionContext ctx = > new CompressionContext(LRUDictionary.class, false, false, true, > Compression.Algorithm.GZ); > ValueCompressor compressor = ctx.getValueCompressor(); > byte[] compressed = compressor.compress(new byte[0], 0, 0); > System.out.println("compressed length: " + compressed.length); > ByteArrayInputStream bis = new ByteArrayInputStream(compressed); > int read = compressor.decompress(bis, compressed.length, new byte[0], 0, > 0); > System.out.println("read length: " + read); > System.out.println("position: " + (compressed.length - bis.available())); > {code} > And the output is > {noformat} > compressed length: 20 > read length: 0 > position: 0 > {noformat} > So it turns out that, when compressing, an empty array will still generate > some output bytes but while reading, we will skip reading anything if we find > the output length is zero, so next time when we read from the stream, we will > start at a wrong position... -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Deprecated the 'hbase.regionserver.hlog.reader.impl' and 'hbase.regionserver.hlog.writer.impl' configurations
Can we keep these configs and add a new one for the replication reader? As @Andrew said, One of the things we are doing is using BookKeeper for WAL storage. This depends on the configuration above. Although we are developing based on branch-1, and it is too early to talk about joining the community, I'm not sure what the attitude of the community is, whether it is willing to accept implementations based on other storage in the future. Thanks. 张铎(Duo Zhang) 于2023年2月14日周二 21:20写道: > Thanks Andrew for the feedback. > > If no other concerns, I will start the refactoring work and deprecated > these two configs. > > Thanks. > > Andrew Purtell 于2023年2月10日周五 23:01写道: > > > As you point out these configuration settings were introduced when we > > migrated from SequenceFile based WALs to the protobuf format. We needed > to > > give users a way to manually migrate, although, arguably, an auto > migration > > would have been better. > > > > In theory these settings allow users to implement their own WAL readers > > and writers. However I do not believe users will do this. The WAL is > > critical for performance and correctness. If anyone is contemplating such > > wizard level changes they can patch the code themselves. It’s fine to > > document these settings as deprecated for sure, and I think ok also to > > claim them unsupported and ignored. > > > > > > > > On Feb 10, 2023, at 3:41 AM, 张铎 wrote: > > > > > > While discussing how to deal with the problem in HBASE-27621, we > > proposed > > > to introduce two types of WAL readers, one for WAL splitting, and the > > other > > > for WAL replication, as replication needs to tail the WAL file which is > > > currently being written, so the logic is much more complicated. We do > not > > > want to affect WAL splitting logic and performance while tweaking the > > > replication related things, as all HBase users need WAL splitting but > not > > > everyone needs replication. > > > > > > But when reviewing the related code, I found that we have two > > > configurations for specifying the WAL reader class and WAL write class, > > > which indicates that we could only have one implementation for the WAL > > > reader. They are 'hbase.regionserver.hlog.reader.impl' and > > > 'hbase.regionserver.hlog.writer.impl'. > > > > > > We mentioned these two configurations several times in our ref guide. > > > > > > HBase 2.0+ can no longer read Sequence File based WAL file. > > > > > > HBase can no longer read the deprecated WAL files written in the Apache > > >> Hadoop Sequence File format. The hbase.regionserver.hlog.reader.impl > and > > >> hbase.regionserver.hlog.writer.impl configuration entries should be > set > > to > > >> use the Protobuf based WAL reader / writer classes. This > implementation > > has > > >> been the default since HBase 0.96, so legacy WAL files should not be a > > >> concern for most downstream users. > > > > > > > > > Configure WAL encryption. > > > > > > Configure WAL encryption in every RegionServer’s hbase-site.xml, by > > setting > > >> the following properties. You can include these in the HMaster’s > > >> hbase-site.xml as well, but the HMaster does not have a WAL and will > not > > >> use them. > > >> > > >> hbase.regionserver.hlog.reader.impl > > >> > > >> > > > org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader > > >> > > >> > > >> hbase.regionserver.hlog.writer.impl > > >> > > >> > > > org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter > > >> > > >> > > >> hbase.regionserver.wal.encryption > > >> true > > >> > > > > > > > > > So in fact, do not consider encryption, the configurations are useless > as > > > we do not support reading sequence file format WAL any more, the only > > valid > > > options are protobuf based reader and write. And for security, I think > > the > > > configuration is redundant as if encryption is enabled, we should use > > > SecureProtobufLogWriter for writing, no matter what the configuration > > value > > > is. And for readers, I do not think we should use a configuration to > > > specify the implementation, we should detect whether the file is > > encrypted > > > and choose a secure or normal reader to read the file. > > > > > > So here, I propose we just deprecated these two configurations because > > they > > > are useless now. > > > > > > Thanks. > > >
[jira] [Created] (HBASE-27642) Expose master startup status via JMX
Xiaolin Ha created HBASE-27642: -- Summary: Expose master startup status via JMX Key: HBASE-27642 URL: https://issues.apache.org/jira/browse/HBASE-27642 Project: HBase Issue Type: Improvement Components: master Reporter: Xiaolin Ha As described in HBASE-21521 by [~apurtell] , add an internal API to the master for tracking startup progress. Expose this information via JMX. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27641) Verify replication excessive false positive bad rows
Hernan Gelaf-Romer created HBASE-27641: -- Summary: Verify replication excessive false positive bad rows Key: HBASE-27641 URL: https://issues.apache.org/jira/browse/HBASE-27641 Project: HBase Issue Type: Improvement Components: mapreduce, Replication Reporter: Hernan Gelaf-Romer Assignee: Hernan Gelaf-Romer Verify replication can generate a lot of `BADROWS` results when comparing a row that may be particularly hot at the time of re-compare. This can lead to a mismatch between the source and sink result if due to replication lag. We could add some configurable re-compare mechanism that will make verify replication less susceptible to falsely reporting `BADROWS` when under significant write load. These re-compares can be done asynchronously so as to not significantly slow down the execution time of the job. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27640) Optimize writes of zero length values in compressed WALs
Andrew Kyle Purtell created HBASE-27640: --- Summary: Optimize writes of zero length values in compressed WALs Key: HBASE-27640 URL: https://issues.apache.org/jira/browse/HBASE-27640 Project: HBase Issue Type: Sub-task Reporter: Andrew Kyle Purtell Fix For: 2.6.0, 3.0.0-alpha-4 If we unconditionally use the compressor, to "write" 0 bytes, then the compression codec will emit overheads... hadoop compressionstream header, compression bitstream header. All of that should be skipped so truly no compressed value data is written when the value is empty. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [DISCUSS] Deprecated the 'hbase.regionserver.hlog.reader.impl' and 'hbase.regionserver.hlog.writer.impl' configurations
Thanks Andrew for the feedback. If no other concerns, I will start the refactoring work and deprecated these two configs. Thanks. Andrew Purtell 于2023年2月10日周五 23:01写道: > As you point out these configuration settings were introduced when we > migrated from SequenceFile based WALs to the protobuf format. We needed to > give users a way to manually migrate, although, arguably, an auto migration > would have been better. > > In theory these settings allow users to implement their own WAL readers > and writers. However I do not believe users will do this. The WAL is > critical for performance and correctness. If anyone is contemplating such > wizard level changes they can patch the code themselves. It’s fine to > document these settings as deprecated for sure, and I think ok also to > claim them unsupported and ignored. > > > > > On Feb 10, 2023, at 3:41 AM, 张铎 wrote: > > > > While discussing how to deal with the problem in HBASE-27621, we > proposed > > to introduce two types of WAL readers, one for WAL splitting, and the > other > > for WAL replication, as replication needs to tail the WAL file which is > > currently being written, so the logic is much more complicated. We do not > > want to affect WAL splitting logic and performance while tweaking the > > replication related things, as all HBase users need WAL splitting but not > > everyone needs replication. > > > > But when reviewing the related code, I found that we have two > > configurations for specifying the WAL reader class and WAL write class, > > which indicates that we could only have one implementation for the WAL > > reader. They are 'hbase.regionserver.hlog.reader.impl' and > > 'hbase.regionserver.hlog.writer.impl'. > > > > We mentioned these two configurations several times in our ref guide. > > > > HBase 2.0+ can no longer read Sequence File based WAL file. > > > > HBase can no longer read the deprecated WAL files written in the Apache > >> Hadoop Sequence File format. The hbase.regionserver.hlog.reader.impl and > >> hbase.regionserver.hlog.writer.impl configuration entries should be set > to > >> use the Protobuf based WAL reader / writer classes. This implementation > has > >> been the default since HBase 0.96, so legacy WAL files should not be a > >> concern for most downstream users. > > > > > > Configure WAL encryption. > > > > Configure WAL encryption in every RegionServer’s hbase-site.xml, by > setting > >> the following properties. You can include these in the HMaster’s > >> hbase-site.xml as well, but the HMaster does not have a WAL and will not > >> use them. > >> > >> hbase.regionserver.hlog.reader.impl > >> > >> > org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader > >> > >> > >> hbase.regionserver.hlog.writer.impl > >> > >> > org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter > >> > >> > >> hbase.regionserver.wal.encryption > >> true > >> > > > > > > So in fact, do not consider encryption, the configurations are useless as > > we do not support reading sequence file format WAL any more, the only > valid > > options are protobuf based reader and write. And for security, I think > the > > configuration is redundant as if encryption is enabled, we should use > > SecureProtobufLogWriter for writing, no matter what the configuration > value > > is. And for readers, I do not think we should use a configuration to > > specify the implementation, we should detect whether the file is > encrypted > > and choose a secure or normal reader to read the file. > > > > So here, I propose we just deprecated these two configurations because > they > > are useless now. > > > > Thanks. >
[jira] [Resolved] (HBASE-27630) hbase-spark bulkload stage directory limited to hdfs only
[ https://issues.apache.org/jira/browse/HBASE-27630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi resolved HBASE-27630. --- Resolution: Fixed Merged to master branch in the hbase-connectors repository. Thanks for the patch [~sergey.soldatov]! > hbase-spark bulkload stage directory limited to hdfs only > - > > Key: HBASE-27630 > URL: https://issues.apache.org/jira/browse/HBASE-27630 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: connector-1.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: hbase-connectors-1.1.0 > > > It's impossible to set up the staging directory for bulkload operation in > spark-hbase connector to any other filesystem different from hdfs. That might > be a problem for deployments where hbase.rootdir points to cloud storage. In > this case, an additional copy task from hdfs to cloud storage would be > required before loading hfiles to hbase. -- This message was sent by Atlassian Jira (v8.20.10#820010)