[jira] [Created] (HBASE-19936) Introduce a new base class for replication peer procedure
Duo Zhang created HBASE-19936: - Summary: Introduce a new base class for replication peer procedure Key: HBASE-19936 URL: https://issues.apache.org/jira/browse/HBASE-19936 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang Assignee: Duo Zhang Fix For: 3.0.0 As the sync replication peer state transition will have more steps than normal replication peer, it will be good to have a common base class for them. Since the peer id will be stored in this class, I tend to change the protobuf message name from 'ModifyPeerStateData' to 'ReplicationPeerProcedureStateData'. This will be committed to master and HBASE-19397-branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19935) Only allow table replication for sync replication for now
Duo Zhang created HBASE-19935: - Summary: Only allow table replication for sync replication for now Key: HBASE-19935 URL: https://issues.apache.org/jira/browse/HBASE-19935 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Duo Zhang Add pre check to only allow table replication for now, no namespace, or replicate all and exclusion. This is used to reduce the difficulty for implementing the sync replication state transition as we need to reopen all the related regions. We can add the support for these features later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19934) HBaseSnapshotException when read replicas is enabled and online snapshot is taken after region splitting
Toshihiro Suzuki created HBASE-19934: Summary: HBaseSnapshotException when read replicas is enabled and online snapshot is taken after region splitting Key: HBASE-19934 URL: https://issues.apache.org/jira/browse/HBASE-19934 Project: HBase Issue Type: Bug Components: snapshots Reporter: Toshihiro Suzuki Investigating HBASE-19893, I'm encountering another issue. Steps to reproduce are as follows: 1. Create a table {code:java} create "test", "cf", {REGION_REPLICATION => 2}{code} 2. Load data to the table {code:java} (0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}{code} 3. Split the table {code:java} split "test"{code} 4. Take a snapshot for the table {code:java} snapshot "test", "snap"{code} And I encountered the following error: {code:java} hbase(main):004:0> snapshot "test", "snap" ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=snap table=test type=FLUSH } had an error. Procedure snap { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:379) at org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1144) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=snap table=test type=FLUSH } due to exception:Manifest region info {ENCODED => b910488a686644a7c1c85246d0d123d5, NAME => 'test,,1517808523837_0001.b910488a686644a7c1c85246d0d123d5.', STARTKEY => '', ENDKEY => '', OFFLINE => true, SPLIT => true, REPLICA_ID => 1}doesn't match expected region:{ENCODED => ef8665859c0b19927b7dc127ec10120a, NAME => 'test,,1517808523837.ef8665859c0b19927b7dc127ec10120a.', STARTKEY => '', ENDKEY => '', OFFLINE => true, SPLIT => true}:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Manifest region info {ENCODED => b910488a686644a7c1c85246d0d123d5, NAME => 'test,,1517808523837_0001.b910488a686644a7c1c85246d0d123d5.', STARTKEY => '', ENDKEY => '', OFFLINE => true, SPLIT => true, REPLICA_ID => 1}doesn't match expected region:{ENCODED => ef8665859c0b19927b7dc127ec10120a, NAME => 'test,,1517808523837.ef8665859c0b19927b7dc127ec10120a.', STARTKEY => '', ENDKEY => '', OFFLINE => true, SPLIT => true} at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:82) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:306) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:368) ... 6 more Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Manifest region info {ENCODED => b910488a686644a7c1c85246d0d123d5, NAME => 'test,,1517808523837_0001.b910488a686644a7c1c85246d0d123d5.', STARTKEY => '', ENDKEY => '', OFFLINE => true, SPLIT => true, REPLICA_ID => 1}doesn't match expected region:{ENCODED => ef8665859c0b19927b7dc127ec10120a, NAME => 'test,,1517808523837.ef8665859c0b19927b7dc127ec10120a.', STARTKEY => '', ENDKEY => '', OFFLINE => true, SPLIT => true} at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegionInfo(MasterSnapshotVerifier.java:223) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:201) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:119) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:202) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Take a snapshot of specified table. Examples: hbase> snapshot 'sourceTable', 'snapshotName' hbase> snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH => true} Took 0.3390 seconds{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Going to roll 1.3.2 RC in next few weeks
Andy, Yup. What still needs to be done? Francis On Thu, Feb 1, 2018 at 5:07 PM Andrew Purtell wrote: > Francis, > > Do you still have an interest in running a 1.3.2 release? > > On Fri, Jan 5, 2018 at 9:37 AM, Andrew Purtell > wrote: > >> Ok. Great! >> >> Will revisit this next month. >> >> On Jan 5, 2018, at 9:35 AM, Francis Christopher Liu >> wrote: >> >> I'm interested in picking it up if that's ok. Tho I'm out till mid next >> week. >> >> Thanks, >> Francis >> >> On Thu, Jan 4, 2018 at 12:00 PM Andrew Purtell >> wrote: >> >>> Ok, I’ll start work on a 1.3.2 RC >>> >>> >>> >>> > On Jan 3, 2018, at 12:13 PM, Andrew Purtell >>> wrote: >>> > >>> > I volunteer, if nobody else would like to pick it up. I think we >>> should at least have one more 1.3 release. Thanks for running the first >>> two 1.3 releases Mikhail. >>> > >>> > >>> >> On Wed, Jan 3, 2018 at 6:01 AM, Mikhail Antonov >>> wrote: >>> >> Sorry, I have been mostly away since Christmas and catching up on >>> emails.. >>> >> >>> >> So I was going to come up with 1.3.2 RC for some time but never >>> really got >>> >> to it, life and work intervened.. As I don't expect that to change >>> soon, I >>> >> suppose it might be better if someone who may have more bandwidth >>> picked it >>> >> up :( Any volunteers? >>> >> >>> >> Since we never moved stable pointer to 1.3 line, I think we can >>> discuss >>> >> separately how many releases out of this line we want. >>> >> >>> >> Thanks, >>> >> Mikhail >>> >> >>> >> On Tue, Dec 26, 2017 at 3:24 PM, 김영우 (YoungWoo Kim) >> > >>> >> wrote: >>> >> >>> >> > Hi Mikhail, >>> >> > >>> >> > Any progress on 1.3.2? Just curious If there is a plan for >>> continuing 1.3 >>> >> > line. >>> >> > >>> >> > Thanks, >>> >> > - Youngwoo >>> >> > >>> >> > On Tue, Oct 10, 2017 at 12:52 PM, Mikhail Antonov < >>> anto...@apache.org> >>> >> > wrote: >>> >> > >>> >> > > This week I'm planning to go through the hanging jiras >>> >> > > and see where we are on it. >>> >> > > >>> >> > > Please speak up if you have open jiras targeted to 1.3.2 that >>> you'd >>> >> > > like to get to 1.3.2 or backport in there, also if you have any >>> concerns >>> >> > > regarding the jiras that already went in or are about to, or have >>> any >>> >> > > findings or doubts. >>> >> > > >>> >> > > Thanks! >>> >> > > Mikhail >>> >> > > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Thanks, >>> >> Michael Antonov >>> > >>> > >>> > >>> > -- >>> > Best regards, >>> > Andrew >>> > >>> > Words like orphans lost among the crosstalk, meaning torn from truth's >>> decrepit hands >>> >- A23, Crosstalk >>> >> > > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands >- A23, Crosstalk >
[jira] [Created] (HBASE-19933) Make use of column family level attribute for skipping hfile range check before create reference during split
Rajeshbabu Chintaguntla created HBASE-19933: --- Summary: Make use of column family level attribute for skipping hfile range check before create reference during split Key: HBASE-19933 URL: https://issues.apache.org/jira/browse/HBASE-19933 Project: HBase Issue Type: Bug Reporter: Rajeshbabu Chintaguntla Assignee: Rajeshbabu Chintaguntla Fix For: 2.0.0-beta-2 Currently we are using split policy to identify whether to skip store file range check or not at the time of reference creation during split. But the full fledged split with region reference cannot be used in master. So as an alternative way we need to make use of column family attribute to set it true or false at client level so the decision happen accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job
[ https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-19803. --- Resolution: Fixed Fix Version/s: 2.0.0-beta-2 Fixed by HBASE-19873. > False positive for the HBASE-Find-Flaky-Tests job > - > > Key: HBASE-19803 > URL: https://issues.apache.org/jira/browse/HBASE-19803 > Project: HBase > Issue Type: Sub-task >Reporter: Duo Zhang >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: 2018-01-24T17-45-37_000-jvmRun1.dumpstream, > HBASE-19803.master.001.patch > > > It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the > surefire output > https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was likely to be killed in the middle of the run within 20 seconds. > https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt > This one was also killed within about 1 minutes. > The test is declared as LargeTests so the time limit should be 10 minutes. It > seems that the jvm may crash during the mvn test run and then we will kill > all the running tests and then we may mark some of them as hang which leads > to the false positive. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19910) TestBucketCache TimesOut
[ https://issues.apache.org/jira/browse/HBASE-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-19910. --- Resolution: Fixed Assignee: stack Hadoop Flags: Reviewed Fix Version/s: 2.0.0-beta-2 Resolve. > TestBucketCache TimesOut > > > Key: HBASE-19910 > URL: https://issues.apache.org/jira/browse/HBASE-19910 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: 0001-HBASE-19910-TestBucketCache-TimesOut.patch > > > See > https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/11303/testReport/org.apache.hadoop.hbase.master.balancer/TestRegionLocationFinder/org_apache_hadoop_hbase_master_balancer_TestRegionLocationFinder/ > This is small test. Runs fast locally. 8 tests. Each is a second or two. Odd > though up on jenkins is that in the middle of one, there is a 19 second > pause. See here: > 2018-02-01 00:56:30,013 INFO [Time-limited test] util.ByteBufferArray(70): > Allocating buffers total=32 MB, sizePerBuffer=2 MB, count=16 > 2018-02-01 00:56:49,678 INFO [Time-limited test] bucket.BucketCache(279): > Instantiating BucketCache with acceptableFactor: 0.95, minFactor: 0.85, > extraFreeFactor: 0.1, singleFactor: 0.25, multiFactor: 0.5, memoryFactor: 0.25 > Here is full test run: > 2018-02-01 00:56:29,981 INFO [Time-limited test] hbase.ResourceChecker(148): > before: io.hfile.bucket.TestBucketCache#testInvalidCacheSplitFactorConfig[1: > blockSize=16,384, bucketSizes=[I@20322d26] Thread=77, OpenFileDescriptor=263, > MaxFileDescriptor=1048576, SystemLoadAverage=2127, ProcessCount=9, > AvailableMemoryMB=7801 > 2018-02-01 00:56:30,013 INFO [Time-limited test] util.ByteBufferArray(70): > Allocating buffers total=32 MB, sizePerBuffer=2 MB, count=16 > 2018-02-01 00:56:49,678 INFO [Time-limited test] bucket.BucketCache(279): > Instantiating BucketCache with acceptableFactor: 0.95, minFactor: 0.85, > extraFreeFactor: 0.1, singleFactor: 0.25, multiFactor: 0.5, memoryFactor: 0.25 > 2018-02-01 00:56:49,689 INFO [Time-limited test] > bucket.BucketAllocator(334): Cache totalSize=33288192, buckets=63, bucket > capacity=528384=(4*132096)=(FEWEST_ITEMS_IN_BUCKET*(largest configured > bucketcache size)) > 2018-02-01 00:56:49,690 INFO [Time-limited test] bucket.BucketCache(322): > Started bucket cache; ioengine=offheap, capacity=32 MB, blockSize=16 KB, > writerThreadNum=3, writerQLen=64, persistencePath=null, > bucketAllocator=org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator > 2018-02-01 00:56:50,020 INFO [Time-limited test] util.ByteBufferArray(70): > Allocating buffers total=32 MB, sizePerBuffer=2 MB, count=16 > 2018-02-01 00:56:50,080 ERROR [Time-limited test] util.ByteBufferArray(101): > Buffer creation interrupted > java.lang.InterruptedException > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at > org.apache.hadoop.hbase.util.ByteBufferArray.createBuffers(ByteBufferArray.java:96) > at > org.apache.hadoop.hbase.util.ByteBufferArray.(ByteBufferArray.java:74) > at > org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.(ByteBufferIOEngine.java:86) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:384) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.(BucketCache.java:262) > at > org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache.checkConfigValues(TestBucketCache.java:387) > at > org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache.testInvalidCacheSplitFactorConfig(TestBucketCache.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at >
[jira] [Resolved] (HBASE-19916) TestCacheOnWrite Times Out
[ https://issues.apache.org/jira/browse/HBASE-19916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-19916. --- Resolution: Fixed Hadoop Flags: Reviewed Resolve. > TestCacheOnWrite Times Out > -- > > Key: HBASE-19916 > URL: https://issues.apache.org/jira/browse/HBASE-19916 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19916.master.001.patch > > > All day it has been timing out. Its a medium test. There is a bit in the > middle where we are hung up for a minute or more: > 2018-02-01 23:01:02,471 DEBUG [Time-limited test] > hfile.HFile$WriterFactory(336): Unable to set drop behind on > /testptch/hbase/hbase-server/target/test-data/6a153924-7f81-4008-ac7e-d0e69384655e/data/default/CompactionCacheOnWrite/6dd6ed35f3b6090bd8d04ed21d687424/.tmp/myCF/c0387b09f82840ab9e636faf5cf02d2d > 2018-02-01 23:01:03,059 DEBUG [Time-limited test] > regionserver.HRegionFileSystem(463): Committing store file > /testptch/hbase/hbase-server/target/test-data/6a153924-7f81-4008-ac7e-d0e69384655e/data/default/CompactionCacheOnWrite/6dd6ed35f3b6090bd8d04ed21d687424/.tmp/myCF/c0387b09f82840ab9e63 > ...[truncated 1865657 bytes]... > b663/myCF/61386855036d4facb75ce7eca2059661, entries=15000, sequenceid=1005, > filesize=85.3 K > 2018-02-01 23:03:50,591 INFO [Time-limited test] regionserver.HRegion(2713): > Finished memstore flush of ~1.73 MB/1814000, currentsize=0 B/0 for region > CompactionCacheOnWrite,,1517526229883.3b579f93f196a847ed1489e71585b663. in > 175ms, sequenceid=1005, compaction requested=false > 2018-02-01 23:03:50,799 INFO [Time-limited test] regionserver.HRegion(2517): > Flushing 1/1 column families, memstore=1.80 MB > ... > I've seen this a few times. The test takes 100seconds locally. > Let me try changing it to type. If that doesn't work, will be back. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19868) TestCoprocessorWhitelistMasterObserver is flakey
[ https://issues.apache.org/jira/browse/HBASE-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-19868. --- Resolution: Fixed Hadoop Flags: Reviewed Resolve. > TestCoprocessorWhitelistMasterObserver is flakey > > > Key: HBASE-19868 > URL: https://issues.apache.org/jira/browse/HBASE-19868 > Project: HBase > Issue Type: Sub-task > Components: flakey, test >Affects Versions: 2.0.0-beta-1 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19868.branch-2.001.patch, > HBASE-19868.master.002.patch > > > TestCoprocessorWhitelistMasterObserver is failing 33% of the time. In the > logs it looks like the failure is related to Master initialization. > Following log is from > [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203] > {noformat} > 2018-01-26 02:36:36,686 WARN [M:0;1f0c4777c1ba:35049] > master.TableNamespaceManager(307): Caught exception in initializing namespace > table manager > org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281) > at > org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103) > at > org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] > helpers.MarkerIgnoringBase(159): Failed to become active master > java.lang.IllegalStateException: Expected the service > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) > at > org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) > at > org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1061) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: > hconnection-0x18cd2ac8 closed > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714
[jira] [Resolved] (HBASE-19908) TestCoprocessorShortCircuitRPC Timeout....
[ https://issues.apache.org/jira/browse/HBASE-19908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-19908. --- Resolution: Fixed Assignee: stack Hadoop Flags: Reviewed Fix Version/s: 2.0.0-beta-2 Resolve. > TestCoprocessorShortCircuitRPC Timeout > -- > > Key: HBASE-19908 > URL: https://issues.apache.org/jira/browse/HBASE-19908 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19908.master.001.patch > > > Timedout in HBASE-19906 > Comparing a local run (16seconds total) to a timed out run up on jenkins, I > see it takes my local test 5 seconds to get the STOPPED server log line. On > jenkins in this timed out test it takes 30 seconds. Test is still running > when it is killed. Let me make it a medium test. > https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/lastCompletedBuild/testReport/org.apache.hadoop.hbase.coprocessor/TestCoprocessorShortCircuitRPC/org_apache_hadoop_hbase_coprocessor_TestCoprocessorShortCircuitRPC/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19909) TestRegionLocationFinder Timeout
[ https://issues.apache.org/jira/browse/HBASE-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-19909. --- Resolution: Fixed Hadoop Flags: Reviewed Seems worked? Resolve. > TestRegionLocationFinder Timeout > > > Key: HBASE-19909 > URL: https://issues.apache.org/jira/browse/HBASE-19909 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19909.branch-2.001.patch > > > This test is timing out a bunch in runs since we moved over to the nice new > fancy, smancy, timeout thingymajig. > Similar to HBASE-19908, I see that on Jenkins, the test is making progress > but is running at a slower rate. > This is a 'smalltest' that starts a minicluster with 5 servers creating a > table with 26 odd regions. > On my uncontested machine, it takes 20 seconds to complete the create table. > On jenkins it takes 29 seconds (see > https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/11303/testReport/org.apache.hadoop.hbase.master.balancer/TestRegionLocationFinder/org_apache_hadoop_hbase_master_balancer_TestRegionLocationFinder/) > Small tests are supposed to complete inside 30 seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19932) TestSecureIPC in branch-1 fails with NoSuchMethodError against hadoop 3
Ted Yu created HBASE-19932: -- Summary: TestSecureIPC in branch-1 fails with NoSuchMethodError against hadoop 3 Key: HBASE-19932 URL: https://issues.apache.org/jira/browse/HBASE-19932 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Fix For: 1.5.0 Error below can be observed when running the test against hadoop 3: {code} org.apache.hadoop.hbase.security.TestSecureIPC Time elapsed: 1.756 sec <<< ERROR! java.lang.NoSuchMethodError: org.apache.kerby.kerberos.kerb.server.SimpleKdcServer.getKadmin()Lorg/apache/kerby/kerberos/kerb/admin/kadmin/local/LocalKadmin; at org.apache.hadoop.hbase.security.TestSecureIPC.setUp(TestSecureIPC.java:112) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-19931) TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas
[ https://issues.apache.org/jira/browse/HBASE-19931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-19931: --- Reopening. We fixed one test. Not it is flakey in another. org.junit.runners.model.TestTimedOutException: test timed out after 600 seconds at org.apache.hadoop.hbase.client.TestMetaWithReplicas.shutdownMetaAndDoValidations(TestMetaWithReplicas.java:265) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testShutdownHandling(TestMetaWithReplicas.java:191) > TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas > -- > > Key: HBASE-19931 > URL: https://issues.apache.org/jira/browse/HBASE-19931 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19931.branch-2.001.patch > > > Somehow we missed a test that depends on a run of HBCK. It fails 100% of the > time now because of HBASE-19726 Failed to start HMaster due to infinite > retrying on meta assign where we no longer update hbase:meta with the state > of hbase:meta; rather, hbase:meta's always-ENABLED state is inferred. It > broke HBCK here. > So, disable the test and just-in-case add meta as ENABLED to hbck though hbck > as is is not for hbase2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19931) TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas
[ https://issues.apache.org/jira/browse/HBASE-19931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19931. --- Resolution: Fixed .001 is what I pushed on master and branch-2. > TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas > -- > > Key: HBASE-19931 > URL: https://issues.apache.org/jira/browse/HBASE-19931 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19931.branch-2.001.patch > > > Somehow we missed a test that depends on a run of HBCK. It fails 100% of the > time now because of HBASE-19726 Failed to start HMaster due to infinite > retrying on meta assign where we no longer update hbase:meta with the state > of hbase:meta; rather, hbase:meta's always-ENABLED state is inferred. It > broke HBCK here. > So, disable the test and just-in-case add meta as ENABLED to hbck though hbck > as is is not for hbase2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19931) TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas
stack created HBASE-19931: - Summary: TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas Key: HBASE-19931 URL: https://issues.apache.org/jira/browse/HBASE-19931 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 2.0.0-beta-2 Somehow we missed a test that depends on a run of HBCK. It fails 100% of the time now because of HBASE-19726 Failed to start HMaster due to infinite retrying on meta assign where we no longer update hbase:meta with the state of hbase:meta; rather, hbase:meta's always-ENABLED state is inferred. It broke HBCK here. So, disable the test and just-in-case add meta as ENABLED to hbck though hbck as is is not for hbase2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Considering branching for 1.5 and other branch-1 release planning
Hi Ted, If Hadoop 3 support is in place for an (eventual) 1.5.0 release, I think that would be great. On Sun, Feb 4, 2018 at 10:55 AM, Ted Yu wrote: > Andrew: > Do you think making 1.5 release support hadoop 3 is among the goals ? > > Cheers > > On Fri, Feb 2, 2018 at 3:28 PM, Andrew Purtell > wrote: > > > The backport of RSGroups to branch-1 triggered the opening of the 1.4 > code > > line as branch-1.4 and releases 1.4.0 and 1.4.1. > > > > After the commit of HBASE-19858 (Backport HBASE-14061 (Support CF-level > > Storage Policy) to branch-1), storage policy aware file placement might > be > > useful enough to trigger a new minor release from branch-1. This would be > > branch-1.5, and at least release 1.5.0. I am not sure about this yet. It > > needs testing. I'd like to mock up a couple of use cases and determine if > > what we have is sufficient on its own or more changes will be needed. I > > want to get the idea of a 1.5 on your radar. though. > > > > Also, I would like to make one more release of branch-1.3 before we > retire > > it. Mikhail passed the reins. We might have a volunteer to RM 1.3.2. If > > not, I will do it. I'm expecting 1.4 will supersede 1.3 but this will be > > decided organically depending on uptake. > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > >- A23, Crosstalk > > > -- Best regards, Andrew Words like orphans lost among the crosstalk, meaning torn from truth's decrepit hands - A23, Crosstalk
Re: Considering branching for 1.5 and other branch-1 release planning
Andrew: Do you think making 1.5 release support hadoop 3 is among the goals ? Cheers On Fri, Feb 2, 2018 at 3:28 PM, Andrew Purtell wrote: > The backport of RSGroups to branch-1 triggered the opening of the 1.4 code > line as branch-1.4 and releases 1.4.0 and 1.4.1. > > After the commit of HBASE-19858 (Backport HBASE-14061 (Support CF-level > Storage Policy) to branch-1), storage policy aware file placement might be > useful enough to trigger a new minor release from branch-1. This would be > branch-1.5, and at least release 1.5.0. I am not sure about this yet. It > needs testing. I'd like to mock up a couple of use cases and determine if > what we have is sufficient on its own or more changes will be needed. I > want to get the idea of a 1.5 on your radar. though. > > Also, I would like to make one more release of branch-1.3 before we retire > it. Mikhail passed the reins. We might have a volunteer to RM 1.3.2. If > not, I will do it. I'm expecting 1.4 will supersede 1.3 but this will be > decided organically depending on uptake. > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands >- A23, Crosstalk >
[jira] [Created] (HBASE-19930) fix ImmutableMemStoreLAB#forceCopyOfBigCellInto
Gali Sheffi created HBASE-19930: --- Summary: fix ImmutableMemStoreLAB#forceCopyOfBigCellInto Key: HBASE-19930 URL: https://issues.apache.org/jira/browse/HBASE-19930 Project: HBase Issue Type: Bug Affects Versions: 2.0.0-beta-1 Reporter: Gali Sheffi Assignee: Gali Sheffi This issue is about fixing ImmutableMemStoreLAB#forceCopyOfBigCellInto. This method only throws an IllegalStateException, instead of forcing the copy as it is supposed to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005)