[jira] [Resolved] (HBASE-27650) Merging empty regions corrupts meta cache
[ https://issues.apache.org/jira/browse/HBASE-27650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Beaudreault resolved HBASE-27650. --- Fix Version/s: 2.4.17 Resolution: Fixed Cherry-picked to branch-2.4 > Merging empty regions corrupts meta cache > - > > Key: HBASE-27650 > URL: https://issues.apache.org/jira/browse/HBASE-27650 > Project: HBase > Issue Type: Bug >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > Labels: patch-available > Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.4 > > > Let's say you have three regions with start keys A, B, C and all are cached > in the meta cache. Region B is empty and not getting any requests, and all 3 > regions are merged together. The new merged region has start key A. > A user submits a request for row C1, which would previously have gone to > region C. That region no longer exists, so the MetaCache returns region C, > the request goes out to the server which throws NotServingRegionException. > That region C is now removed from the cache, and meta is scanned. The meta > scan returns the newly merged region A, which is cached into the MetaCache. > So now we have a MetaCache where A has been updated with the newly merged > RegionInfo, B still exists with the old/deleted RegionInfo, and C has been > removed. > A user submits a request for row C1 again. This _should_ go to region A, but > we do cache.floorEntry(C1) which returns the old but still cached region B. > We have checks in MetaCache which validate the RegionInfo.getEndKey() against > the requested row, and that validation fails because C1 is beyond the endkey > of the old region. The cached region B result is ignored and cache returns > null. Meta is scanned, and returns the new region A, which is cached again. > Requests to rows C1+ will still succeed... but they will always require a > meta scan because the meta cache will always return that old region B which > is invalid and doesn't contain the C1+ rows. > Currently, the only way this will ever resolve is if a request is sent to > region B, which will cause a NotServingRegionException which will finally > clear region B from the cache. At that point, requests for C1+ will properly > get resolved to region A in the cache. > I've created a reproducible test case here: > [https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12] > This problem affects both AsyncTable and branch-2's Table. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HBASE-27650) Merging empty regions corrupts meta cache
[ https://issues.apache.org/jira/browse/HBASE-27650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Beaudreault reopened HBASE-27650: --- Yes, you're right. Re-opening for 2.4 cherry-pick. > Merging empty regions corrupts meta cache > - > > Key: HBASE-27650 > URL: https://issues.apache.org/jira/browse/HBASE-27650 > Project: HBase > Issue Type: Bug >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > Labels: patch-available > Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4 > > > Let's say you have three regions with start keys A, B, C and all are cached > in the meta cache. Region B is empty and not getting any requests, and all 3 > regions are merged together. The new merged region has start key A. > A user submits a request for row C1, which would previously have gone to > region C. That region no longer exists, so the MetaCache returns region C, > the request goes out to the server which throws NotServingRegionException. > That region C is now removed from the cache, and meta is scanned. The meta > scan returns the newly merged region A, which is cached into the MetaCache. > So now we have a MetaCache where A has been updated with the newly merged > RegionInfo, B still exists with the old/deleted RegionInfo, and C has been > removed. > A user submits a request for row C1 again. This _should_ go to region A, but > we do cache.floorEntry(C1) which returns the old but still cached region B. > We have checks in MetaCache which validate the RegionInfo.getEndKey() against > the requested row, and that validation fails because C1 is beyond the endkey > of the old region. The cached region B result is ignored and cache returns > null. Meta is scanned, and returns the new region A, which is cached again. > Requests to rows C1+ will still succeed... but they will always require a > meta scan because the meta cache will always return that old region B which > is invalid and doesn't contain the C1+ rows. > Currently, the only way this will ever resolve is if a request is sent to > region B, which will cause a NotServingRegionException which will finally > clear region B from the cache. At that point, requests for C1+ will properly > get resolved to region A in the cache. > I've created a reproducible test case here: > [https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12] > This problem affects both AsyncTable and branch-2's Table. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27668) PB's parseDelimitedFrom can successfully return when there are not enough bytes
[ https://issues.apache.org/jira/browse/HBASE-27668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-27668. --- Hadoop Flags: Reviewed Resolution: Fixed Pushed to branch-2.4+. Thanks [~vjasani] for reviewing! > PB's parseDelimitedFrom can successfully return when there are not enough > bytes > --- > > Key: HBASE-27668 > URL: https://issues.apache.org/jira/browse/HBASE-27668 > Project: HBase > Issue Type: Bug > Components: Protobufs, wal >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Critical > Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.4 > > > Found this when writing some UTs for parsing partial header and trailer, > WALHeader.parseDelimitedFrom can return successfully when there are only two > bytes in the stream(only the length, actually). > So I know why in the past we have a followingKvCount == 0 check in > ProtobufLogReader, as we just want to prevent the partial PB message. > This is a very critial problem, for me I think we should provide our own > implementation of parseDelimitedFrom for some critical usages, for example, > when reading WAL entries. If there are not enough data, we just throw > exception out instead of returning a partial PB message. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27675) Document zookeeper based cluster manager
Rajeshbabu Chintaguntla created HBASE-27675: --- Summary: Document zookeeper based cluster manager Key: HBASE-27675 URL: https://issues.apache.org/jira/browse/HBASE-27675 Project: HBase Issue Type: Sub-task Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27674) Chaos Service Improvements
Rajeshbabu Chintaguntla created HBASE-27674: --- Summary: Chaos Service Improvements Key: HBASE-27674 URL: https://issues.apache.org/jira/browse/HBASE-27674 Project: HBase Issue Type: Task Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla We can improve the usage of chaos service to run random operations in the real cluster to verify the stability. Following things can be done. 1) Make use of the hbase script in the existing chaos-daemon.sh script instead of directly using the java command. 2) We can add a script to chaos server runner to run the script in the background. 3) Document usage of zookeeper based cluster manager mainly in the environments where ssh cannot be performed. 4) sudo is not required to kill the it's own user process so while running the commands need not use sudo. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27673) Fix mTLS client authentication
Balazs Meszaros created HBASE-27673: --- Summary: Fix mTLS client authentication Key: HBASE-27673 URL: https://issues.apache.org/jira/browse/HBASE-27673 Project: HBase Issue Type: Bug Components: rpc Affects Versions: 3.0.0-alpha-3 Reporter: Balazs Meszaros Assignee: Balazs Meszaros The exception what I get: {noformat} 23/02/22 15:18:06 ERROR tls.HBaseTrustManager: Failed to verify host address: 127.0.0.1 javax.net.ssl.SSLPeerUnverifiedException: Certificate for <127.0.0.1> doesn't match any of the subject alternative names: [***] at org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.matchIPAddress(HBaseHostnameVerifier.java:144) at org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.verify(HBaseHostnameVerifier.java:117) at org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.performHostVerification(HBaseTrustManager.java:143) at org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.checkClientTrusted(HBaseTrustManager.java:97) ... 23/02/22 15:18:06 ERROR tls.HBaseTrustManager: Failed to verify hostname: localhost javax.net.ssl.SSLPeerUnverifiedException: Certificate for doesn't match any of the subject alternative names: [***] at org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.matchDNSName(HBaseHostnameVerifier.java:159) at org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.verify(HBaseHostnameVerifier.java:119) at org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.performHostVerification(HBaseTrustManager.java:171) at org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.checkClientTrusted(HBaseTrustManager.java:97) ... 23/02/22 15:18:06 WARN ipc.NettyRpcServer: Connection /100.100.124.2:47109; caught unexpected downstream exception. org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Failed to verify both host address and host name at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:750) Caused by: javax.net.ssl.SSLHandshakeException: Failed to verify both host address and host name at sun.security.ssl.Alert.createSSLException(Alert.java:131) at sun.security.ssl.TransportContext.fatal(TransportContext.java:324) at sun.security.ssl.TransportContext.fatal(TransportContext.java:267) at sun.security.ssl.TransportContext.fatal(TransportContext.java:262) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkClientCerts(CertificateMessage.java:700) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:411) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:375) at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:377) at
[jira] [Created] (HBASE-27672) Read RPC threads may BLOCK at the Configuration.get when using java compression
Xiaolin Ha created HBASE-27672: -- Summary: Read RPC threads may BLOCK at the Configuration.get when using java compression Key: HBASE-27672 URL: https://issues.apache.org/jira/browse/HBASE-27672 Project: HBase Issue Type: Improvement Affects Versions: 2.5.3 Reporter: Xiaolin Ha Assignee: Xiaolin Ha Attachments: image-2023-02-27-19-22-52-704.png As in the jstack info, we can see some RPC threads or compaction threads BLOCK, !image-2023-02-27-19-22-52-704.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27671) Able to restore snapshot even after TTL expired
Ashok shetty created HBASE-27671: Summary: Able to restore snapshot even after TTL expired Key: HBASE-27671 URL: https://issues.apache.org/jira/browse/HBASE-27671 Project: HBase Issue Type: Bug Affects Versions: 2.5.2 Environment: ENV- HBase 2.5.2 Reporter: Ashok shetty Steps: precondition : base.master.cleaner.snapshot.interval to 5 min in hbase-site.xml 1. create a table t1 , put some data 2. create a snapshot 'snapt1' with TTL 1 mins let the TTL expries 3. disable and drop table t1 4. restore snapshot t1 Actual : restore snapshot successful Expected : restore operation should fail and throw specified snapshot TTL expried cant restore Note : its can consider as improvement point -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27670) The FSDataOutputStream is obtained without reflection mode
alan.zhao created HBASE-27670: - Summary: The FSDataOutputStream is obtained without reflection mode Key: HBASE-27670 URL: https://issues.apache.org/jira/browse/HBASE-27670 Project: HBase Issue Type: Improvement Environment: HBase version: 2.2.3 Reporter: alan.zhao hbase interacts with hdfs and obtains FSDataOutputStream to generate HFiles. In order to support favoredNodes, reflection is used. The DistributedFileSystem has a more direct way to get the FSDataOutputStream,for example:dfs.createFile(path).permission(perm).create()...; this API allows you to set various parameters, including favoredNodes. I think avoiding reflection can improve performance, and if you agree with me, I can optimize this part of the code; Model:hbase-server class:FSUtils {code:java} public static FSDataOutputStream create(Configuration conf, FileSystem fs, Path path, FsPermission perm, InetSocketAddress[] favoredNodes) throws IOException { if (fs instanceof HFileSystem) { FileSystem backingFs = ((HFileSystem) fs).getBackingFs(); if (backingFs instanceof DistributedFileSystem) { // Try to use the favoredNodes version via reflection to allow backwards- // compatibility. short replication = Short.parseShort(conf.get(ColumnFamilyDescriptorBuilder.DFS_REPLICATION, String.valueOf(ColumnFamilyDescriptorBuilder.DEFAULT_DFS_REPLICATION))); try { return (FSDataOutputStream) (DistributedFileSystem.class .getDeclaredMethod("create", Path.class, FsPermission.class, boolean.class, int.class, short.class, long.class, Progressable.class, InetSocketAddress[].class) .invoke(backingFs, path, perm, true, CommonFSUtils.getDefaultBufferSize(backingFs), replication > 0 ? replication : CommonFSUtils.getDefaultReplication(backingFs, path), CommonFSUtils.getDefaultBlockSize(backingFs, path), null, favoredNodes));{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)