[jira] [Created] (HBASE-27509) Possible region gets stuck in CLOSING state
Rajeshbabu Chintaguntla created HBASE-27509: --- Summary: Possible region gets stuck in CLOSING state Key: HBASE-27509 URL: https://issues.apache.org/jira/browse/HBASE-27509 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 2.3.4 Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla There is a possible chance of region gets stuck in closing state could be because of race between the flush and close or some where the readlock acquired on the region is not getting released. {noformat} "MemStoreFlusher.1" #236 prio=5 os_prio=0 tid=0x5639266a4000 nid=0x296e waiting on condition [0x7fdc48a63000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fdf42dde850> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359) "MemStoreFlusher.0" #234 prio=5 os_prio=0 tid=0x5639266a2800 nid=0x296d waiting on condition [0x7fdc48b64000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fdf42dde850> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359) {noformat} {noformat} "RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0" #6337 daemon prio=5 os_prio=0 tid=0x7fdc05448800 nid=0x15d1 waiting on condition [0x7fdc1befd000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x7fdf42dde850> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1662) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1591) - locked <0x7fdf42ddf358> (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:114) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {noformat} >From one of the region server logs flushed has started and replay edits of >flush added then close
Re: [DISCUSS] HBase 2.5 / Hadoop 3 artifacts
I've put up 2.5.2RC0, which contains a hadoop3 dist and also hadoop3 maven artifacts, it is built with hadoop 3.2.4. The dist is available here https://dist.apache.org/repos/dist/dev/hbase/2.5.2RC0/ And the maven artifacts is available here https://repository.apache.org/content/repositories/orgapachehbase-1504/ Notice that the version for hadoop3 maven artifacts is 2.5.2-hadoop3. Please take a look and have a try. Thanks. 张铎(Duo Zhang) 于2022年10月31日周一 12:02写道: > > Some progress here. > With other developers help(especially Nick, Andrew and Guanghao), I've > successfully made the release scripts able to publish binaries and > maven artifacts for hadoop3, in a dry run mode, > > https://github.com/apache/hbase/pull/4856 > > I've put up a discussion thread, for quickly releasing 2.5.2 for the > 2.5 release line, with hadoop3 binaries. Please shout if you have any > ideas. > > Thanks. > > 张铎(Duo Zhang) 于2022年10月24日周一 12:27写道: > > > > HBASE-27434 has been landed to branch-2.5+. Branch-2.4 does not have a > > flatten plugin so do not apply HBASE-27434 to it. > > > > Filed HBASE-27442 for changing the way of bumping versions in release > > scripts. > > > > After this change, let's finally go back to HBASE-27359 to make the > > release scripts publish different artifacts for hadoop2 and hadoop3. > > > > Thanks. > > > > Andrew Purtell 于2022年10月19日周三 23:36写道: > > > > > > Suggestions: > > > > > > - For HBase 2.x releases, we should continue to publish default builds, > > > those without any -hadoop3- or -widgetfoo- modifiers, against Hadoop 2. > > > > > > - For HBase 3, it makes sense to move the default to Hadoop 3, no other > > > build variants needed there. This is the kind of thing a major version > > > increment allows us to do per our dependency compatibility guidelines. > > > > > > - While eventually it may be necessary to differentiate between minor > > > release lines of Hadoop it would be simpler to pick one Hadoop 3 version, > > > like 3.3.4, and build and publish a -hadoop3- artifact for each current > > > releasing 2.x code line: 2.4.15-hadoop3, 2.5.2-hadoop3, 2.6.0-hadoop3. > > > > > > - The process of building releases is automated by create-release, which > > > all RMs use now. create-release automates the process of building and > > > signing tarballs and publishing to Nexus. There should be no significant > > > new burden on the RM, beyond an increase in time for create-release > > > execution, to parameterize it and iterate over one or more variant builds. > > > That is a long way of suggesting we do publish variant tarballs too, they > > > are almost "for free" if we've gone to the trouble to build for publishing > > > to Nexus. > > > > > > > > > On Wed, Oct 19, 2022 at 12:52 AM 张铎(Duo Zhang) > > > wrote: > > > > > > > After some investigating, I think using the $revision placeholder can > > > > solve the problem here, i.e, using different command line to publish > > > > different artifacts for hadoop2 and hadoop3, with the same souce code. > > > > You can see the comment on HBASE-27359 for more details. > > > > > > > > Next I will open an issue to land the $revision change. And here, I > > > > think first we need to discuss how many new artifacts we want to > > > > publish. For example, for 2.6.0, we only want to publish a > > > > 2.6.0-hadoop3, with the default hadoop3 version? Or we publish > > > > 2.6.0-hadoop3.2, 2.6.0-hadoop3.3 for different hadoop minor release > > > > lines? And do we want to publish different tarballs for hadoop2 and > > > > hadoop3? > > > > > > > > Thanks. > > > > > > > > Andrew Purtell 于2022年8月31日周三 00:19写道: > > > > > > > > > > I also don't think we should change the defaults in branch-2 until > > > > Hadoop 2 > > > > > is EOLed. > > > > > > > > > > On Mon, Aug 29, 2022 at 10:22 AM Sean Busbey > > > > > wrote: > > > > > > > > > > > I think changing the default hadoop profile for builds in branch-2 > > > > would > > > > > > unnecessarily complicate our compatibility messaging so long as > > > > > > Hadoop > > > > 2 > > > > > > hasn't gone EOL. > > > > > > > > > > > > On Mon, Aug 29, 2022 at 5:30 AM Nick Dimiduk > > > > wrote: > > > > > > > > > > > > > Should we also make hadoop3 the default active profile for > > > > > > > branch-2 > > > > going > > > > > > > forward? > > > > > > > > > > > > > > On Fri, Aug 26, 2022 at 5:25 PM Andrew Purtell < > > > > andrew.purt...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > The security posture of Hadoop 2 in general is a problem, > > > > > > > > because > > > > > > > > maintenance on that branch is spotty, that is just how it goes. > > > > > > > > We > > > > had > > > > > > > the > > > > > > > > same situation with our now EOL branch-1. I know Hadoop released > > > > 2.10.2 > > > > > > > to > > > > > > > > address some CVE worthy problems but it is unclear if 2.10.2 > > > > addresses > > > > > > > all > > > > > > > > known issues, unlike 3.3.4. Also as you know Hadoop 2 has > > > >
[jira] [Created] (HBASE-27508) Hbase master is not up due to NotServingRegionException: hbase:meta,,1 is not online
kaushik mandal created HBASE-27508: -- Summary: Hbase master is not up due to NotServingRegionException: hbase:meta,,1 is not online Key: HBASE-27508 URL: https://issues.apache.org/jira/browse/HBASE-27508 Project: HBase Issue Type: Bug Affects Versions: 2.4.13 Environment: we are using hbase 2.4.13 and hdfs 3.3.0 Reporter: kaushik mandal Hbase master is in initializing state and never become ready. it is not up due to following error "NotServingRegionException: hbase:meta,,1 is not online on hbase-regionserver-0.hbase-regionserver.default.svc.cluster.local, ..." when we observe this: when region server split into multiple region and there is restart of hbase regionserver and master. {code:java} INFO [main] util.HBaseFsck: Validating mapping using HDFS state Number of live region servers: 1 Number of dead region servers: 2 Master: hbase-master-0.hbase-master.default.svc.cluster.local, Number of backup masters: 0 Average load: 0.0 Number of requests: 0 Number of regions: 0 Number of regions in transition: 0 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=6, retries=16, started=4333 ms ago, cancelled=false, msg=org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2830) at org.apache.hadoop.hbase.master.MasterRpcServices.getTableDescriptors(MasterRpcServices.java:1075) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:384) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:371) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:351) , details=, see https://s.apache.org/timeout {code} workaround: to make it up, we are deleting /hbase/meta-regionserver using "hbase zkcli delete /hbase/meta-regionserver" is there any way to prevent this to occur by setting some properties in hbase-site.xml -- This message was sent by Atlassian Jira (v8.20.10#820010)