[jira] [Created] (HBASE-19936) Introduce a new base class for replication peer procedure

2018-02-04 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-19936:
-

 Summary: Introduce a new base class for replication peer procedure
 Key: HBASE-19936
 URL: https://issues.apache.org/jira/browse/HBASE-19936
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: 3.0.0


As the sync replication peer state transition will have more steps than normal 
replication peer, it will be good to have a common base class for them.

Since the peer id will be stored in this class, I tend to change the protobuf 
message name from 'ModifyPeerStateData' to 'ReplicationPeerProcedureStateData'. 
This will be committed to master and HBASE-19397-branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19935) Only allow table replication for sync replication for now

2018-02-04 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-19935:
-

 Summary: Only allow table replication for sync replication for now
 Key: HBASE-19935
 URL: https://issues.apache.org/jira/browse/HBASE-19935
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Duo Zhang


Add pre check to only allow table replication for now, no namespace, or 
replicate all and exclusion.

This is used to reduce the difficulty for implementing the sync replication 
state transition as we need to reopen all the related regions.

We can add the support for these features later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19934) HBaseSnapshotException when read replicas is enabled and online snapshot is taken after region splitting

2018-02-04 Thread Toshihiro Suzuki (JIRA)
Toshihiro Suzuki created HBASE-19934:


 Summary: HBaseSnapshotException when read replicas is enabled and 
online snapshot is taken after region splitting
 Key: HBASE-19934
 URL: https://issues.apache.org/jira/browse/HBASE-19934
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Reporter: Toshihiro Suzuki


Investigating HBASE-19893, I'm encountering another issue.

Steps to reproduce are as follows:

1. Create a table
{code:java}
create "test", "cf", {REGION_REPLICATION => 2}{code}
2. Load data to the table
{code:java}
(0...2000).each{|i| put "test", "row#{i}", "cf:col", "val"}{code}
3. Split the table
{code:java}
split "test"{code}
4. Take a snapshot for the table
{code:java}
snapshot "test", "snap"{code}
And I encountered the following error:
{code:java}
hbase(main):004:0> snapshot "test", "snap"

ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
ss=snap table=test type=FLUSH } had an error. Procedure snap { waiting=[] 
done=[] }
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:379)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1144)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:406)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
Failed taking snapshot { ss=snap table=test type=FLUSH } due to 
exception:Manifest region info {ENCODED => b910488a686644a7c1c85246d0d123d5, 
NAME => 'test,,1517808523837_0001.b910488a686644a7c1c85246d0d123d5.', STARTKEY 
=> '', ENDKEY => '', OFFLINE => true, SPLIT => true, REPLICA_ID => 1}doesn't 
match expected region:{ENCODED => ef8665859c0b19927b7dc127ec10120a, NAME => 
'test,,1517808523837.ef8665859c0b19927b7dc127ec10120a.', STARTKEY => '', ENDKEY 
=> '', OFFLINE => true, SPLIT => 
true}:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Manifest 
region info {ENCODED => b910488a686644a7c1c85246d0d123d5, NAME => 
'test,,1517808523837_0001.b910488a686644a7c1c85246d0d123d5.', STARTKEY => '', 
ENDKEY => '', OFFLINE => true, SPLIT => true, REPLICA_ID => 1}doesn't match 
expected region:{ENCODED => ef8665859c0b19927b7dc127ec10120a, NAME => 
'test,,1517808523837.ef8665859c0b19927b7dc127ec10120a.', STARTKEY => '', ENDKEY 
=> '', OFFLINE => true, SPLIT => true}
at 
org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:82)
at 
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:306)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:368)
... 6 more
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: 
Manifest region info {ENCODED => b910488a686644a7c1c85246d0d123d5, NAME => 
'test,,1517808523837_0001.b910488a686644a7c1c85246d0d123d5.', STARTKEY => '', 
ENDKEY => '', OFFLINE => true, SPLIT => true, REPLICA_ID => 1}doesn't match 
expected region:{ENCODED => ef8665859c0b19927b7dc127ec10120a, NAME => 
'test,,1517808523837.ef8665859c0b19927b7dc127ec10120a.', STARTKEY => '', ENDKEY 
=> '', OFFLINE => true, SPLIT => true}
at 
org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegionInfo(MasterSnapshotVerifier.java:223)
at 
org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:201)
at 
org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:119)
at 
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:202)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Take a snapshot of specified table. Examples:

hbase> snapshot 'sourceTable', 'snapshotName'
hbase> snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH => true}

Took 0.3390 seconds{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Going to roll 1.3.2 RC in next few weeks

2018-02-04 Thread Francis Christopher Liu
Andy,

Yup. What still needs to be done?

Francis

On Thu, Feb 1, 2018 at 5:07 PM Andrew Purtell 
wrote:

> Francis,
>
> Do you still have an interest in running a 1.3.2 release?
>
> On Fri, Jan 5, 2018 at 9:37 AM, Andrew Purtell 
> wrote:
>
>> Ok. Great!
>>
>> Will revisit this next month.
>>
>> On Jan 5, 2018, at 9:35 AM, Francis Christopher Liu 
>> wrote:
>>
>> I'm interested in picking it up if that's ok. Tho I'm out till mid next
>> week.
>>
>> Thanks,
>> Francis
>>
>> On Thu, Jan 4, 2018 at 12:00 PM Andrew Purtell 
>> wrote:
>>
>>> Ok, I’ll start work on a 1.3.2 RC
>>>
>>>
>>>
>>> > On Jan 3, 2018, at 12:13 PM, Andrew Purtell 
>>> wrote:
>>> >
>>> > I volunteer, if nobody else would like to pick it up. I think we
>>> should at least have one more 1.3 release.  Thanks for running the first
>>> two 1.3 releases Mikhail.
>>> >
>>> >
>>> >> On Wed, Jan 3, 2018 at 6:01 AM, Mikhail Antonov 
>>> wrote:
>>> >> Sorry, I have been mostly away since Christmas and catching up on
>>> emails..
>>> >>
>>> >> So I was going to come up with 1.3.2 RC for some time but never
>>> really got
>>> >> to it, life and work intervened.. As I don't expect that to change
>>> soon, I
>>> >> suppose it might be better if someone who may have more bandwidth
>>> picked it
>>> >> up :(  Any volunteers?
>>> >>
>>> >> Since we never moved stable pointer to 1.3 line, I think we can
>>> discuss
>>> >> separately how many releases out of this line we want.
>>> >>
>>> >> Thanks,
>>> >> Mikhail
>>> >>
>>> >> On Tue, Dec 26, 2017 at 3:24 PM, 김영우 (YoungWoo Kim) >> >
>>> >> wrote:
>>> >>
>>> >> > Hi Mikhail,
>>> >> >
>>> >> > Any progress on 1.3.2? Just curious If there is a plan for
>>> continuing 1.3
>>> >> > line.
>>> >> >
>>> >> > Thanks,
>>> >> > - Youngwoo
>>> >> >
>>> >> > On Tue, Oct 10, 2017 at 12:52 PM, Mikhail Antonov <
>>> anto...@apache.org>
>>> >> > wrote:
>>> >> >
>>> >> > > This week I'm planning to go through the hanging jiras
>>> >> > > and see where we are on it.
>>> >> > >
>>> >> > > Please speak up if you have open jiras targeted to 1.3.2 that
>>> you'd
>>> >> > > like to get to 1.3.2 or backport in there, also if you have any
>>> concerns
>>> >> > > regarding the jiras that already went in or are about to, or have
>>> any
>>> >> > > findings or doubts.
>>> >> > >
>>> >> > > Thanks!
>>> >> > > Mikhail
>>> >> > >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Thanks,
>>> >> Michael Antonov
>>> >
>>> >
>>> >
>>> > --
>>> > Best regards,
>>> > Andrew
>>> >
>>> > Words like orphans lost among the crosstalk, meaning torn from truth's
>>> decrepit hands
>>> >- A23, Crosstalk
>>>
>>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, Crosstalk
>


[jira] [Created] (HBASE-19933) Make use of column family level attribute for skipping hfile range check before create reference during split

2018-02-04 Thread Rajeshbabu Chintaguntla (JIRA)
Rajeshbabu Chintaguntla created HBASE-19933:
---

 Summary: Make use of column family level attribute for skipping 
hfile range check before create reference during split
 Key: HBASE-19933
 URL: https://issues.apache.org/jira/browse/HBASE-19933
 Project: HBase
  Issue Type: Bug
Reporter: Rajeshbabu Chintaguntla
Assignee: Rajeshbabu Chintaguntla
 Fix For: 2.0.0-beta-2


Currently we are using split policy to  identify whether to skip store file 
range check or not at the time of reference creation during split. But the full 
fledged split with region reference cannot be used in master. So as an 
alternative way we need to make use of column family attribute to set it true 
or false at client level so the decision happen accordingly. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-19803) False positive for the HBASE-Find-Flaky-Tests job

2018-02-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-19803.
---
   Resolution: Fixed
Fix Version/s: 2.0.0-beta-2

Fixed by HBASE-19873.

> False positive for the HBASE-Find-Flaky-Tests job
> -
>
> Key: HBASE-19803
> URL: https://issues.apache.org/jira/browse/HBASE-19803
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: 2018-01-24T17-45-37_000-jvmRun1.dumpstream, 
> HBASE-19803.master.001.patch
>
>
> It reports two hangs for TestAsyncTableGetMultiThreaded, but I checked the 
> surefire output
> https://builds.apache.org/job/HBASE-Flaky-Tests/24830/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt
> This one was likely to be killed in the middle of the run within 20 seconds.
> https://builds.apache.org/job/HBASE-Flaky-Tests/24852/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestAsyncTableGetMultiThreaded-output.txt
> This one was also killed within about 1 minutes.
> The test is declared as LargeTests so the time limit should be 10 minutes. It 
> seems that the jvm may crash during the mvn test run and then we will kill 
> all the running tests and then we may mark some of them as hang which leads 
> to the false positive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-19910) TestBucketCache TimesOut

2018-02-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-19910.
---
   Resolution: Fixed
 Assignee: stack
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.0-beta-2

Resolve.

> TestBucketCache TimesOut
> 
>
> Key: HBASE-19910
> URL: https://issues.apache.org/jira/browse/HBASE-19910
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: 0001-HBASE-19910-TestBucketCache-TimesOut.patch
>
>
> See 
> https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/11303/testReport/org.apache.hadoop.hbase.master.balancer/TestRegionLocationFinder/org_apache_hadoop_hbase_master_balancer_TestRegionLocationFinder/
> This is  small test. Runs fast locally. 8 tests. Each is a second or two. Odd 
> though up on jenkins is that in the middle of one, there is a 19 second 
> pause. See here:
> 2018-02-01 00:56:30,013 INFO  [Time-limited test] util.ByteBufferArray(70): 
> Allocating buffers total=32 MB, sizePerBuffer=2 MB, count=16
> 2018-02-01 00:56:49,678 INFO  [Time-limited test] bucket.BucketCache(279): 
> Instantiating BucketCache with acceptableFactor: 0.95, minFactor: 0.85, 
> extraFreeFactor: 0.1, singleFactor: 0.25, multiFactor: 0.5, memoryFactor: 0.25
> Here is full test run:
> 2018-02-01 00:56:29,981 INFO  [Time-limited test] hbase.ResourceChecker(148): 
> before: io.hfile.bucket.TestBucketCache#testInvalidCacheSplitFactorConfig[1: 
> blockSize=16,384, bucketSizes=[I@20322d26] Thread=77, OpenFileDescriptor=263, 
> MaxFileDescriptor=1048576, SystemLoadAverage=2127, ProcessCount=9, 
> AvailableMemoryMB=7801
> 2018-02-01 00:56:30,013 INFO  [Time-limited test] util.ByteBufferArray(70): 
> Allocating buffers total=32 MB, sizePerBuffer=2 MB, count=16
> 2018-02-01 00:56:49,678 INFO  [Time-limited test] bucket.BucketCache(279): 
> Instantiating BucketCache with acceptableFactor: 0.95, minFactor: 0.85, 
> extraFreeFactor: 0.1, singleFactor: 0.25, multiFactor: 0.5, memoryFactor: 0.25
> 2018-02-01 00:56:49,689 INFO  [Time-limited test] 
> bucket.BucketAllocator(334): Cache totalSize=33288192, buckets=63, bucket 
> capacity=528384=(4*132096)=(FEWEST_ITEMS_IN_BUCKET*(largest configured 
> bucketcache size))
> 2018-02-01 00:56:49,690 INFO  [Time-limited test] bucket.BucketCache(322): 
> Started bucket cache; ioengine=offheap, capacity=32 MB, blockSize=16 KB, 
> writerThreadNum=3, writerQLen=64, persistencePath=null, 
> bucketAllocator=org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator
> 2018-02-01 00:56:50,020 INFO  [Time-limited test] util.ByteBufferArray(70): 
> Allocating buffers total=32 MB, sizePerBuffer=2 MB, count=16
> 2018-02-01 00:56:50,080 ERROR [Time-limited test] util.ByteBufferArray(101): 
> Buffer creation interrupted
> java.lang.InterruptedException
>   at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>   at 
> org.apache.hadoop.hbase.util.ByteBufferArray.createBuffers(ByteBufferArray.java:96)
>   at 
> org.apache.hadoop.hbase.util.ByteBufferArray.(ByteBufferArray.java:74)
>   at 
> org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.(ByteBufferIOEngine.java:86)
>   at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:384)
>   at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.(BucketCache.java:262)
>   at 
> org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache.checkConfigValues(TestBucketCache.java:387)
>   at 
> org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache.testInvalidCacheSplitFactorConfig(TestBucketCache.java:377)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
>

[jira] [Resolved] (HBASE-19916) TestCacheOnWrite Times Out

2018-02-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-19916.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Resolve.

> TestCacheOnWrite Times Out
> --
>
> Key: HBASE-19916
> URL: https://issues.apache.org/jira/browse/HBASE-19916
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19916.master.001.patch
>
>
> All day it has been timing out. Its a medium test. There is a bit in the 
> middle where we are hung up for a minute or more:
> 2018-02-01 23:01:02,471 DEBUG [Time-limited test] 
> hfile.HFile$WriterFactory(336): Unable to set drop behind on 
> /testptch/hbase/hbase-server/target/test-data/6a153924-7f81-4008-ac7e-d0e69384655e/data/default/CompactionCacheOnWrite/6dd6ed35f3b6090bd8d04ed21d687424/.tmp/myCF/c0387b09f82840ab9e636faf5cf02d2d
> 2018-02-01 23:01:03,059 DEBUG [Time-limited test] 
> regionserver.HRegionFileSystem(463): Committing store file 
> /testptch/hbase/hbase-server/target/test-data/6a153924-7f81-4008-ac7e-d0e69384655e/data/default/CompactionCacheOnWrite/6dd6ed35f3b6090bd8d04ed21d687424/.tmp/myCF/c0387b09f82840ab9e63
> ...[truncated 1865657 bytes]...
> b663/myCF/61386855036d4facb75ce7eca2059661, entries=15000, sequenceid=1005, 
> filesize=85.3 K
> 2018-02-01 23:03:50,591 INFO  [Time-limited test] regionserver.HRegion(2713): 
> Finished memstore flush of ~1.73 MB/1814000, currentsize=0 B/0 for region 
> CompactionCacheOnWrite,,1517526229883.3b579f93f196a847ed1489e71585b663. in 
> 175ms, sequenceid=1005, compaction requested=false
> 2018-02-01 23:03:50,799 INFO  [Time-limited test] regionserver.HRegion(2517): 
> Flushing 1/1 column families, memstore=1.80 MB
> ...
> I've seen this a few times. The test takes 100seconds locally.
> Let me try changing it to type. If that doesn't work, will be back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-19868) TestCoprocessorWhitelistMasterObserver is flakey

2018-02-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-19868.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Resolve.

> TestCoprocessorWhitelistMasterObserver is flakey
> 
>
> Key: HBASE-19868
> URL: https://issues.apache.org/jira/browse/HBASE-19868
> Project: HBase
>  Issue Type: Sub-task
>  Components: flakey, test
>Affects Versions: 2.0.0-beta-1
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19868.branch-2.001.patch, 
> HBASE-19868.master.002.patch
>
>
> TestCoprocessorWhitelistMasterObserver is failing 33% of the time. In the 
> logs it looks like the failure is related to Master initialization.
> Following log is from 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/203] 
> {noformat}
> 2018-01-26 02:36:36,686 WARN [M:0;1f0c4777c1ba:35049] 
> master.TableNamespaceManager(307): Caught exception in initializing namespace 
> table manager
> org.apache.hadoop.hbase.DoNotRetryIOException: hconnection-0x18cd2ac8 closed
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:684)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getRegionLocation(ConnectionImplementation.java:562)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getRegionLocation(ConnectionUtils.java:131)
> at 
> org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:73)
> at 
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:223)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:388)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:362)
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:141)
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.isTableAvailableAndInitialized(TableNamespaceManager.java:281)
> at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:103)
> at 
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:62)
> at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
> at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1059)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553)
> at java.lang.Thread.run(Thread.java:748)
> 2018-01-26 02:36:36,691 ERROR [M:0;1f0c4777c1ba:35049] 
> helpers.MarkerIgnoringBase(159): Failed to become active master
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
> at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
> at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
> at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1061)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:921)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2034)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: 
> hconnection-0x18cd2ac8 closed
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:722)
> at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:131)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:714

[jira] [Resolved] (HBASE-19908) TestCoprocessorShortCircuitRPC Timeout....

2018-02-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-19908.
---
   Resolution: Fixed
 Assignee: stack
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.0-beta-2

Resolve.

> TestCoprocessorShortCircuitRPC Timeout
> --
>
> Key: HBASE-19908
> URL: https://issues.apache.org/jira/browse/HBASE-19908
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19908.master.001.patch
>
>
> Timedout in HBASE-19906
> Comparing a local run (16seconds total) to a timed out run up on jenkins, I 
> see it takes my local test 5 seconds to get the STOPPED server log line. On 
> jenkins in this timed out test it takes 30 seconds. Test is still running 
> when it is killed. Let me make it a medium test.
> https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/lastCompletedBuild/testReport/org.apache.hadoop.hbase.coprocessor/TestCoprocessorShortCircuitRPC/org_apache_hadoop_hbase_coprocessor_TestCoprocessorShortCircuitRPC/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-19909) TestRegionLocationFinder Timeout

2018-02-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-19909.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Seems worked? Resolve.

> TestRegionLocationFinder Timeout
> 
>
> Key: HBASE-19909
> URL: https://issues.apache.org/jira/browse/HBASE-19909
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19909.branch-2.001.patch
>
>
> This test is timing out a bunch in runs since we moved over to the nice new 
> fancy, smancy, timeout thingymajig.
> Similar to HBASE-19908, I see that on Jenkins, the test is making progress 
> but is running at a slower rate.
> This is a 'smalltest' that starts a minicluster with 5 servers creating a 
> table with 26 odd regions.
> On my uncontested machine, it takes 20 seconds to complete the create table. 
> On jenkins it takes  29 seconds (see 
> https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/11303/testReport/org.apache.hadoop.hbase.master.balancer/TestRegionLocationFinder/org_apache_hadoop_hbase_master_balancer_TestRegionLocationFinder/)
>  Small tests are supposed to complete inside 30 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19932) TestSecureIPC in branch-1 fails with NoSuchMethodError against hadoop 3

2018-02-04 Thread Ted Yu (JIRA)
Ted Yu created HBASE-19932:
--

 Summary: TestSecureIPC in branch-1 fails with NoSuchMethodError 
against hadoop 3
 Key: HBASE-19932
 URL: https://issues.apache.org/jira/browse/HBASE-19932
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu
 Fix For: 1.5.0


Error below can be observed when running the test against hadoop 3:
{code}
org.apache.hadoop.hbase.security.TestSecureIPC  Time elapsed: 1.756 sec  <<< 
ERROR!
java.lang.NoSuchMethodError: 
org.apache.kerby.kerberos.kerb.server.SimpleKdcServer.getKadmin()Lorg/apache/kerby/kerberos/kerb/admin/kadmin/local/LocalKadmin;
at 
org.apache.hadoop.hbase.security.TestSecureIPC.setUp(TestSecureIPC.java:112)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-19931) TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas

2018-02-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-19931:
---

Reopening. We fixed one test. Not it is flakey in another.

org.junit.runners.model.TestTimedOutException: test timed out after 600 seconds
at 
org.apache.hadoop.hbase.client.TestMetaWithReplicas.shutdownMetaAndDoValidations(TestMetaWithReplicas.java:265)
at 
org.apache.hadoop.hbase.client.TestMetaWithReplicas.testShutdownHandling(TestMetaWithReplicas.java:191)



> TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas
> --
>
> Key: HBASE-19931
> URL: https://issues.apache.org/jira/browse/HBASE-19931
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19931.branch-2.001.patch
>
>
> Somehow we missed a test that depends on a run of HBCK. It fails 100% of the 
> time now because of HBASE-19726 Failed to start HMaster due to infinite 
> retrying on meta assign where we no longer update hbase:meta with the state 
> of hbase:meta; rather, hbase:meta's always-ENABLED state is inferred. It 
> broke HBCK here.
> So, disable the test and just-in-case add meta as ENABLED to hbck though hbck 
> as is is not for hbase2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-19931) TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas

2018-02-04 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-19931.
---
Resolution: Fixed

.001 is what I pushed on master and branch-2.

> TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas
> --
>
> Key: HBASE-19931
> URL: https://issues.apache.org/jira/browse/HBASE-19931
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19931.branch-2.001.patch
>
>
> Somehow we missed a test that depends on a run of HBCK. It fails 100% of the 
> time now because of HBASE-19726 Failed to start HMaster due to infinite 
> retrying on meta assign where we no longer update hbase:meta with the state 
> of hbase:meta; rather, hbase:meta's always-ENABLED state is inferred. It 
> broke HBCK here.
> So, disable the test and just-in-case add meta as ENABLED to hbck though hbck 
> as is is not for hbase2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19931) TestMetaWithReplicas failing 100% of the time in testHBaseFsckWithMetaReplicas

2018-02-04 Thread stack (JIRA)
stack created HBASE-19931:
-

 Summary: TestMetaWithReplicas failing 100% of the time in 
testHBaseFsckWithMetaReplicas
 Key: HBASE-19931
 URL: https://issues.apache.org/jira/browse/HBASE-19931
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: stack
 Fix For: 2.0.0-beta-2


Somehow we missed a test that depends on a run of HBCK. It fails 100% of the 
time now because of HBASE-19726 Failed to start HMaster due to infinite 
retrying on meta assign where we no longer update hbase:meta with the state of 
hbase:meta; rather, hbase:meta's always-ENABLED state is inferred. It broke 
HBCK here.

So, disable the test and just-in-case add meta as ENABLED to hbck though hbck 
as is is not for hbase2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Considering branching for 1.5 and other branch-1 release planning

2018-02-04 Thread Andrew Purtell
Hi Ted,

If Hadoop 3 support is in place for an (eventual) 1.5.0 release, I think
that would be great.


On Sun, Feb 4, 2018 at 10:55 AM, Ted Yu  wrote:

> Andrew:
> Do you think making 1.5 release support hadoop 3 is among the goals ?
>
> Cheers
>
> On Fri, Feb 2, 2018 at 3:28 PM, Andrew Purtell 
> wrote:
>
> > The backport of RSGroups to branch-1 triggered the opening of the 1.4
> code
> > line as branch-1.4 and releases 1.4.0 and 1.4.1.
> >
> > After the commit of HBASE-19858 (Backport HBASE-14061 (Support CF-level
> > Storage Policy) to branch-1), storage policy aware file placement might
> be
> > useful enough to trigger a new minor release from branch-1. This would be
> > branch-1.5, and at least release 1.5.0. I am not sure about this yet. It
> > needs testing. I'd like to mock up a couple of use cases and determine if
> > what we have is sufficient on its own or more changes will be needed. I
> > want to get the idea of a 1.5 on your radar. though.
> >
> > Also, I would like to make one more release of branch-1.3 before we
> retire
> > it. Mikhail passed the reins. We might have a volunteer to RM 1.3.2. If
> > not, I will do it. I'm expecting 1.4 will supersede 1.3 but this will be
> > decided organically depending on uptake.
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk


Re: Considering branching for 1.5 and other branch-1 release planning

2018-02-04 Thread Ted Yu
Andrew:
Do you think making 1.5 release support hadoop 3 is among the goals ?

Cheers

On Fri, Feb 2, 2018 at 3:28 PM, Andrew Purtell  wrote:

> The backport of RSGroups to branch-1 triggered the opening of the 1.4 code
> line as branch-1.4 and releases 1.4.0 and 1.4.1.
>
> After the commit of HBASE-19858 (Backport HBASE-14061 (Support CF-level
> Storage Policy) to branch-1), storage policy aware file placement might be
> useful enough to trigger a new minor release from branch-1. This would be
> branch-1.5, and at least release 1.5.0. I am not sure about this yet. It
> needs testing. I'd like to mock up a couple of use cases and determine if
> what we have is sufficient on its own or more changes will be needed. I
> want to get the idea of a 1.5 on your radar. though.
>
> Also, I would like to make one more release of branch-1.3 before we retire
> it. Mikhail passed the reins. We might have a volunteer to RM 1.3.2. If
> not, I will do it. I'm expecting 1.4 will supersede 1.3 but this will be
> decided organically depending on uptake.
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, Crosstalk
>


[jira] [Created] (HBASE-19930) fix ImmutableMemStoreLAB#forceCopyOfBigCellInto

2018-02-04 Thread Gali Sheffi (JIRA)
Gali Sheffi created HBASE-19930:
---

 Summary: fix ImmutableMemStoreLAB#forceCopyOfBigCellInto
 Key: HBASE-19930
 URL: https://issues.apache.org/jira/browse/HBASE-19930
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0-beta-1
Reporter: Gali Sheffi
Assignee: Gali Sheffi


This issue is about fixing ImmutableMemStoreLAB#forceCopyOfBigCellInto. This 
method only throws an IllegalStateException, instead of forcing the copy as it 
is supposed to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)