[jira] [Resolved] (HBASE-27650) Merging empty regions corrupts meta cache

2023-02-27 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-27650.
---
Fix Version/s: 2.4.17
   Resolution: Fixed

Cherry-picked to branch-2.4

> Merging empty regions corrupts meta cache
> -
>
> Key: HBASE-27650
> URL: https://issues.apache.org/jira/browse/HBASE-27650
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: patch-available
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.4
>
>
> Let's say you have three regions with start keys A, B, C and all are cached 
> in the meta cache. Region B is empty and not getting any requests, and all 3 
> regions are merged together. The new merged region has start key A.
> A user submits a request for row C1, which would previously have gone to 
> region C. That region no longer exists, so the MetaCache returns region C, 
> the request goes out to the server which throws NotServingRegionException. 
> That region C is now removed from the cache, and meta is scanned. The meta 
> scan returns the newly merged region A, which is cached into the MetaCache.
> So now we have a MetaCache where A has been updated with the newly merged 
> RegionInfo, B still exists with the old/deleted RegionInfo, and C has been 
> removed.
> A user submits a request for row C1 again. This _should_ go to region A, but 
> we do cache.floorEntry(C1) which returns the old but still cached region B. 
> We have checks in MetaCache which validate the RegionInfo.getEndKey() against 
> the requested row, and that validation fails because C1 is beyond the endkey 
> of the old region. The cached region B result is ignored and cache returns 
> null. Meta is scanned, and returns the new region A, which is cached again.
> Requests to rows C1+ will still succeed... but they will always require a 
> meta scan because the meta cache will always return that old region B which 
> is invalid and doesn't contain the C1+ rows.
> Currently, the only way this will ever resolve is if a request is sent to 
> region B, which will cause a NotServingRegionException which will finally 
> clear region B from the cache. At that point, requests for C1+ will properly 
> get resolved to region A in the cache.
> I've created a reproducible test case here: 
> [https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12]
> This problem affects both AsyncTable and branch-2's Table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-27650) Merging empty regions corrupts meta cache

2023-02-27 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault reopened HBASE-27650:
---

Yes, you're right. Re-opening for 2.4 cherry-pick.

> Merging empty regions corrupts meta cache
> -
>
> Key: HBASE-27650
> URL: https://issues.apache.org/jira/browse/HBASE-27650
> Project: HBase
>  Issue Type: Bug
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: patch-available
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.5.4
>
>
> Let's say you have three regions with start keys A, B, C and all are cached 
> in the meta cache. Region B is empty and not getting any requests, and all 3 
> regions are merged together. The new merged region has start key A.
> A user submits a request for row C1, which would previously have gone to 
> region C. That region no longer exists, so the MetaCache returns region C, 
> the request goes out to the server which throws NotServingRegionException. 
> That region C is now removed from the cache, and meta is scanned. The meta 
> scan returns the newly merged region A, which is cached into the MetaCache.
> So now we have a MetaCache where A has been updated with the newly merged 
> RegionInfo, B still exists with the old/deleted RegionInfo, and C has been 
> removed.
> A user submits a request for row C1 again. This _should_ go to region A, but 
> we do cache.floorEntry(C1) which returns the old but still cached region B. 
> We have checks in MetaCache which validate the RegionInfo.getEndKey() against 
> the requested row, and that validation fails because C1 is beyond the endkey 
> of the old region. The cached region B result is ignored and cache returns 
> null. Meta is scanned, and returns the new region A, which is cached again.
> Requests to rows C1+ will still succeed... but they will always require a 
> meta scan because the meta cache will always return that old region B which 
> is invalid and doesn't contain the C1+ rows.
> Currently, the only way this will ever resolve is if a request is sent to 
> region B, which will cause a NotServingRegionException which will finally 
> clear region B from the cache. At that point, requests for C1+ will properly 
> get resolved to region A in the cache.
> I've created a reproducible test case here: 
> [https://gist.github.com/bbeaudreault/c82ff9f8ad0b9424eb987483ede35c12]
> This problem affects both AsyncTable and branch-2's Table.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27668) PB's parseDelimitedFrom can successfully return when there are not enough bytes

2023-02-27 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-27668.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to branch-2.4+.

Thanks [~vjasani] for reviewing!

> PB's parseDelimitedFrom can successfully return when there are not enough 
> bytes
> ---
>
> Key: HBASE-27668
> URL: https://issues.apache.org/jira/browse/HBASE-27668
> Project: HBase
>  Issue Type: Bug
>  Components: Protobufs, wal
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.4
>
>
> Found this when writing some UTs for parsing partial header and trailer, 
> WALHeader.parseDelimitedFrom can return successfully when there are only two 
> bytes in the stream(only the length, actually).
> So I know why in the past we have a followingKvCount == 0 check in 
> ProtobufLogReader, as we just want to prevent the partial PB message.
> This is a very critial problem, for me I think we should provide our own 
> implementation of parseDelimitedFrom for some critical usages, for example, 
> when reading WAL entries. If there are not enough data, we just throw 
> exception out instead of returning a partial PB message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27675) Document zookeeper based cluster manager

2023-02-27 Thread Rajeshbabu Chintaguntla (Jira)
Rajeshbabu Chintaguntla created HBASE-27675:
---

 Summary: Document zookeeper based cluster manager
 Key: HBASE-27675
 URL: https://issues.apache.org/jira/browse/HBASE-27675
 Project: HBase
  Issue Type: Sub-task
Reporter: Rajeshbabu Chintaguntla
Assignee: Rajeshbabu Chintaguntla






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27674) Chaos Service Improvements

2023-02-27 Thread Rajeshbabu Chintaguntla (Jira)
Rajeshbabu Chintaguntla created HBASE-27674:
---

 Summary: Chaos Service Improvements
 Key: HBASE-27674
 URL: https://issues.apache.org/jira/browse/HBASE-27674
 Project: HBase
  Issue Type: Task
Reporter: Rajeshbabu Chintaguntla
Assignee: Rajeshbabu Chintaguntla


We can improve the usage of chaos service to run random operations in the real 
cluster to verify the stability. Following things can be done.

1) Make use of the hbase script in the existing chaos-daemon.sh script instead 
of directly using the java command.

2) We can add a script to chaos server runner to run the script in the 
background.

3) Document usage of zookeeper based cluster manager mainly in the environments 
where ssh cannot be performed.

4) sudo is not required to kill the it's own user process so while running the 
commands need not use sudo.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27673) Fix mTLS client authentication

2023-02-27 Thread Balazs Meszaros (Jira)
Balazs Meszaros created HBASE-27673:
---

 Summary: Fix mTLS client authentication
 Key: HBASE-27673
 URL: https://issues.apache.org/jira/browse/HBASE-27673
 Project: HBase
  Issue Type: Bug
  Components: rpc
Affects Versions: 3.0.0-alpha-3
Reporter: Balazs Meszaros
Assignee: Balazs Meszaros


The exception what I get:

{noformat}
23/02/22 15:18:06 ERROR tls.HBaseTrustManager: Failed to verify host address: 
127.0.0.1
javax.net.ssl.SSLPeerUnverifiedException: Certificate for <127.0.0.1> doesn't 
match any of the subject alternative names: [***]
at 
org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.matchIPAddress(HBaseHostnameVerifier.java:144)
at 
org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.verify(HBaseHostnameVerifier.java:117)
at 
org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.performHostVerification(HBaseTrustManager.java:143)
at 
org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.checkClientTrusted(HBaseTrustManager.java:97)
...
23/02/22 15:18:06 ERROR tls.HBaseTrustManager: Failed to verify hostname: 
localhost
javax.net.ssl.SSLPeerUnverifiedException: Certificate for  doesn't 
match any of the subject alternative names: [***]
at 
org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.matchDNSName(HBaseHostnameVerifier.java:159)
at 
org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.verify(HBaseHostnameVerifier.java:119)
at 
org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.performHostVerification(HBaseTrustManager.java:171)
at 
org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.checkClientTrusted(HBaseTrustManager.java:97)
...
23/02/22 15:18:06 WARN ipc.NettyRpcServer: Connection /100.100.124.2:47109; 
caught unexpected downstream exception.
org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
javax.net.ssl.SSLHandshakeException: Failed to verify both host address and 
host name
at 
org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499)
at 
org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at 
org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
Caused by: javax.net.ssl.SSLHandshakeException: Failed to verify both host 
address and host name
at sun.security.ssl.Alert.createSSLException(Alert.java:131)
at sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
at sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
at sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
at 
sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkClientCerts(CertificateMessage.java:700)
at 
sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:411)
at 
sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:375)
at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:377)
at 

[jira] [Created] (HBASE-27672) Read RPC threads may BLOCK at the Configuration.get when using java compression

2023-02-27 Thread Xiaolin Ha (Jira)
Xiaolin Ha created HBASE-27672:
--

 Summary: Read RPC threads may BLOCK at the Configuration.get when 
using java compression
 Key: HBASE-27672
 URL: https://issues.apache.org/jira/browse/HBASE-27672
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.5.3
Reporter: Xiaolin Ha
Assignee: Xiaolin Ha
 Attachments: image-2023-02-27-19-22-52-704.png

As in the jstack info, we can see some RPC threads or compaction threads BLOCK,

!image-2023-02-27-19-22-52-704.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27671) Able to restore snapshot even after TTL expired

2023-02-27 Thread Ashok shetty (Jira)
Ashok shetty created HBASE-27671:


 Summary: Able to restore snapshot even after TTL expired
 Key: HBASE-27671
 URL: https://issues.apache.org/jira/browse/HBASE-27671
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.2
 Environment: ENV- HBase 2.5.2
Reporter: Ashok shetty


Steps:

precondition : base.master.cleaner.snapshot.interval to 5 min in hbase-site.xml

1. create a table t1 , put some data

2. create a snapshot 'snapt1' with TTL 1 mins

let the TTL expries 

3. disable and drop table t1

4. restore snapshot t1

Actual : restore snapshot successful 

Expected : restore operation should fail and throw specified snapshot TTL 
expried cant restore

Note : its can consider as improvement point 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27670) The FSDataOutputStream is obtained without reflection mode

2023-02-27 Thread alan.zhao (Jira)
alan.zhao created HBASE-27670:
-

 Summary: The FSDataOutputStream is obtained without reflection mode
 Key: HBASE-27670
 URL: https://issues.apache.org/jira/browse/HBASE-27670
 Project: HBase
  Issue Type: Improvement
 Environment: HBase version: 2.2.3
Reporter: alan.zhao


hbase interacts with hdfs and obtains FSDataOutputStream to generate HFiles. In 
order to support favoredNodes, reflection is used. The DistributedFileSystem 
has a more direct way to get the FSDataOutputStream,for 
example:dfs.createFile(path).permission(perm).create()...;  this API allows you 
to set various parameters, including favoredNodes. I think avoiding reflection 
can improve performance, and if you agree with me, I can optimize this part of 
the code;

Model:hbase-server

class:FSUtils

 
{code:java}
public static FSDataOutputStream create(Configuration conf, FileSystem fs, Path 
path,
FsPermission perm, InetSocketAddress[] favoredNodes) throws IOException {
if (fs instanceof HFileSystem) {
FileSystem backingFs = ((HFileSystem) fs).getBackingFs();
if (backingFs instanceof DistributedFileSystem) {
// Try to use the favoredNodes version via reflection to allow backwards-
// compatibility.
short replication = 
Short.parseShort(conf.get(ColumnFamilyDescriptorBuilder.DFS_REPLICATION,
String.valueOf(ColumnFamilyDescriptorBuilder.DEFAULT_DFS_REPLICATION)));
try {
return (FSDataOutputStream) (DistributedFileSystem.class
.getDeclaredMethod("create", Path.class, FsPermission.class, boolean.class, 
int.class,
short.class, long.class, Progressable.class, InetSocketAddress[].class)
.invoke(backingFs, path, perm, true, 
CommonFSUtils.getDefaultBufferSize(backingFs),
replication > 0 ? replication : CommonFSUtils.getDefaultReplication(backingFs, 
path),
CommonFSUtils.getDefaultBlockSize(backingFs, path), null, favoredNodes));{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)