[jira] [Created] (HBASE-27509) Possible region gets stuck in CLOSING state

2022-11-24 Thread Rajeshbabu Chintaguntla (Jira)
Rajeshbabu Chintaguntla created HBASE-27509:
---

 Summary: Possible region gets stuck in CLOSING state
 Key: HBASE-27509
 URL: https://issues.apache.org/jira/browse/HBASE-27509
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 2.3.4
Reporter: Rajeshbabu Chintaguntla
Assignee: Rajeshbabu Chintaguntla


There is a possible chance of region gets stuck in closing state could be 
because of race between the flush and close or some where the readlock acquired 
on the region is not getting released.
{noformat}
"MemStoreFlusher.1" #236 prio=5 os_prio=0 tid=0x5639266a4000 nid=0x296e 
waiting on condition [0x7fdc48a63000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x7fdf42dde850> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)

"MemStoreFlusher.0" #234 prio=5 os_prio=0 tid=0x5639266a2800 nid=0x296d 
waiting on condition [0x7fdc48b64000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x7fdf42dde850> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
{noformat} 
{noformat}
"RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0" #6337 daemon prio=5 
os_prio=0 tid=0x7fdc05448800 nid=0x15d1 waiting on condition 
[0x7fdc1befd000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x7fdf42dde850> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1662)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1591)
- locked <0x7fdf42ddf358> (a java.lang.Object)
at 
org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:114)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{noformat}

>From one of the region server logs flushed has started and replay edits of 
>flush added then close 

Re: [DISCUSS] HBase 2.5 / Hadoop 3 artifacts

2022-11-24 Thread Duo Zhang
I've put up 2.5.2RC0, which contains a hadoop3 dist and also hadoop3
maven artifacts, it is built with hadoop 3.2.4.

The dist is available here
https://dist.apache.org/repos/dist/dev/hbase/2.5.2RC0/

And the maven artifacts is available here
https://repository.apache.org/content/repositories/orgapachehbase-1504/

Notice that the version for hadoop3 maven artifacts is 2.5.2-hadoop3.

Please take a look and have a try.

Thanks.



张铎(Duo Zhang)  于2022年10月31日周一 12:02写道:


>
> Some progress here.
> With other developers help(especially Nick, Andrew and Guanghao), I've
> successfully made the release scripts able to publish binaries and
> maven artifacts for hadoop3, in a dry run mode,
>
> https://github.com/apache/hbase/pull/4856
>
> I've put up a discussion thread, for quickly releasing 2.5.2 for the
> 2.5 release line, with hadoop3 binaries. Please shout if you have any
> ideas.
>
> Thanks.
>
> 张铎(Duo Zhang)  于2022年10月24日周一 12:27写道:
> >
> > HBASE-27434 has been landed to branch-2.5+. Branch-2.4 does not have a
> > flatten plugin so do not apply HBASE-27434 to it.
> >
> > Filed HBASE-27442 for changing the way of bumping versions in release 
> > scripts.
> >
> > After this change, let's finally go back to HBASE-27359 to make the
> > release scripts publish different artifacts for hadoop2 and hadoop3.
> >
> > Thanks.
> >
> > Andrew Purtell  于2022年10月19日周三 23:36写道:
> > >
> > > Suggestions:
> > >
> > > - For HBase 2.x releases, we should continue to publish default builds,
> > > those without any -hadoop3- or -widgetfoo- modifiers, against Hadoop 2.
> > >
> > > - For HBase 3, it makes sense to move the default to Hadoop 3, no other
> > > build variants needed there. This is the kind of thing a major version
> > > increment allows us to do per our dependency compatibility guidelines.
> > >
> > > - While eventually it may be necessary to differentiate between minor
> > > release lines of Hadoop it would be simpler to pick one Hadoop 3 version,
> > > like 3.3.4, and build and publish a -hadoop3- artifact for each current
> > > releasing 2.x code line: 2.4.15-hadoop3, 2.5.2-hadoop3, 2.6.0-hadoop3.
> > >
> > > - The process of building releases is automated by create-release, which
> > > all RMs use now. create-release automates the process of building and
> > > signing tarballs and publishing to Nexus. There should be no significant
> > > new burden on the RM, beyond an increase in time for create-release
> > > execution, to parameterize it and iterate over one or more variant builds.
> > > That is a long way of suggesting we do publish variant tarballs too, they
> > > are almost "for free" if we've gone to the trouble to build for publishing
> > > to Nexus.
> > >
> > >
> > > On Wed, Oct 19, 2022 at 12:52 AM 张铎(Duo Zhang) 
> > > wrote:
> > >
> > > > After some investigating, I think using the $revision placeholder can
> > > > solve the problem here, i.e, using different command line to publish
> > > > different artifacts for hadoop2 and hadoop3, with the same souce code.
> > > > You can see the comment on HBASE-27359 for more details.
> > > >
> > > > Next I will open an issue to land the $revision change. And here, I
> > > > think first we need to discuss how many new artifacts we want to
> > > > publish. For example, for 2.6.0, we only want to publish a
> > > > 2.6.0-hadoop3, with the default hadoop3 version? Or we publish
> > > > 2.6.0-hadoop3.2, 2.6.0-hadoop3.3 for different hadoop minor release
> > > > lines? And do we want to publish different tarballs for hadoop2 and
> > > > hadoop3?
> > > >
> > > > Thanks.
> > > >
> > > > Andrew Purtell  于2022年8月31日周三 00:19写道:
> > > > >
> > > > > I also don't think we should change the defaults in branch-2 until
> > > > Hadoop 2
> > > > > is EOLed.
> > > > >
> > > > > On Mon, Aug 29, 2022 at 10:22 AM Sean Busbey  
> > > > > wrote:
> > > > >
> > > > > > I think changing the default hadoop profile for builds in branch-2
> > > > would
> > > > > > unnecessarily complicate our compatibility messaging so long as 
> > > > > > Hadoop
> > > > 2
> > > > > > hasn't gone EOL.
> > > > > >
> > > > > > On Mon, Aug 29, 2022 at 5:30 AM Nick Dimiduk 
> > > > wrote:
> > > > > >
> > > > > > > Should we also make hadoop3 the default active profile for 
> > > > > > > branch-2
> > > > going
> > > > > > > forward?
> > > > > > >
> > > > > > > On Fri, Aug 26, 2022 at 5:25 PM Andrew Purtell <
> > > > andrew.purt...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > The security posture of Hadoop 2 in general is a problem, 
> > > > > > > > because
> > > > > > > > maintenance on that branch is spotty, that is just how it goes. 
> > > > > > > > We
> > > > had
> > > > > > > the
> > > > > > > > same situation with our now EOL branch-1. I know Hadoop released
> > > > 2.10.2
> > > > > > > to
> > > > > > > > address some CVE worthy problems but it is unclear if 2.10.2
> > > > addresses
> > > > > > > all
> > > > > > > > known issues, unlike 3.3.4. Also as you know Hadoop 2 has
> > > > 

[jira] [Created] (HBASE-27508) Hbase master is not up due to NotServingRegionException: hbase:meta,,1 is not online

2022-11-24 Thread kaushik mandal (Jira)
kaushik mandal created HBASE-27508:
--

 Summary: Hbase master is not up due to NotServingRegionException: 
hbase:meta,,1 is not online
 Key: HBASE-27508
 URL: https://issues.apache.org/jira/browse/HBASE-27508
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.13
 Environment: we are using 

hbase 2.4.13 and hdfs 3.3.0
Reporter: kaushik mandal


Hbase master is in initializing state and never become ready.

it is not up due to following error

"NotServingRegionException: hbase:meta,,1 is not online on 
hbase-regionserver-0.hbase-regionserver.default.svc.cluster.local, ..."

when we observe this:

when region server split into multiple region and there is restart of hbase 
regionserver and master.
{code:java}
INFO  [main] util.HBaseFsck: Validating mapping using HDFS state
Number of live region servers: 1
Number of dead region servers: 2
Master: hbase-master-0.hbase-master.default.svc.cluster.local,
Number of backup masters: 0
Average load: 0.0
Number of requests: 0 
Number of regions: 0
Number of regions in transition: 0
INFO  [main] client.RpcRetryingCallerImpl: Call exception, tries=6, retries=16, 
started=4333 ms ago, cancelled=false, 
msg=org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
    at 
org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2830)
    at 
org.apache.hadoop.hbase.master.MasterRpcServices.getTableDescriptors(MasterRpcServices.java:1075)
    at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:384)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:371)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:351)
, details=, see https://s.apache.org/timeout {code}
workaround:

to make it up, we are deleting /hbase/meta-regionserver using "hbase zkcli 
delete   /hbase/meta-regionserver"

 

is there any way to prevent this to occur by setting some properties in 
hbase-site.xml 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)