Re: [VOTE] First release candidate for HBase 1.1.11 (RC0) is available
+1 Checked sums and signatures: ok Built from source: ok (7u79) RAT check: ok (7u79) Compatibility check: ok (7u79), ran check_compatibility.sh locally, 100% compatible with 1.1.10 (both binary and source) Unit tests pass: ok (8u101) - Passed with -Dsurefire.rerunFailingTestsCount=2 Shell commands: ok (8u101), ran DDL/flush/compact/split commands, everything looks good Loaded 1M rows with LTT: ok (8u101), all keys verified, latency and logs looks good Best Regards, Yu On 16 June 2017 at 01:35, Josh Elserwrote: > +1 (binding) > > * No unexpected binaries in source release > * L look good > * Could build from source > * Could run bin-tarball as-is > * Checked compat report (thanks for publishing) > * xsum/sigs OK > * Ran a PE randomwritetest > > > On 6/10/17 7:40 PM, Nick Dimiduk wrote: > >> I'm happy to announce the first release candidate of HBase 1.1.11 >> (HBase-1.1.11RC0) is available for download at >> https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.11RC0/ >> >> Maven artifacts are also available in the staging repository >> https://repository.apache.org/content/repositories/orgapachehbase-1170/ >> >> Artifacts are signed with my code signing subkey 0xAD9039071C3489BD, >> available in the Apache keys directory >> https://people.apache.org/keys/committer/ndimiduk.asc and in our KEYS >> file >> http://www-us.apache.org/dist/hbase/KEYS. >> >> There's also a signed tag for this release at >> https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h= >> d318fbeef0990e53efb313c1a459cce73ed4edb8 >> >> The detailed source and binary compatibility report vs 1.1.10 has been >> published for your review, at >> https://home.apache.org/~ndimiduk/1.1.10_1.1.11RC0_compat_report.html >> >> HBase 1.1.11 is the eleventh patch release in the HBase 1.1 line, >> continuing on the theme of bringing a stable, reliable database to the >> Hadoop and NoSQL communities. This release includes nearly 20 bug fixes >> since the 1.1.10 release. Notable correctness fixes include >> HBASE-17937, HBASE-18036, HBASE-18081, HBASE-18093, HBASE-16011, and >> HBASE-18066. >> >> The full list of fixes included in this release is available at >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje >> ctId=12310753=12340375 >> and and in the CHANGES.txt file included in the distribution. >> >> Let's leave this vote up an extra couple days since HBaseCon is this week. >> Please try out this candidate and vote +/-1 by 23:59 Pacific time on >> Sunday, 2017-06-18 as to whether we should release these artifacts as >> HBase >> 1.1.11. >> >> Thanks, >> Nick >> >>
[jira] [Created] (HBASE-18225) Fix findbugs regression calling toString() on an array
Josh Elser created HBASE-18225: -- Summary: Fix findbugs regression calling toString() on an array Key: HBASE-18225 URL: https://issues.apache.org/jira/browse/HBASE-18225 Project: HBase Issue Type: Bug Reporter: Josh Elser Assignee: Josh Elser Priority: Trivial Fix For: 2.0.0, 3.0.0 Looks like we got a findbugs warning as a result of HBASE-18166 {code} diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java index 1d04944250..b7e0244aa2 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java @@ -2807,8 +2807,8 @@ public class RSRpcServices implements HBaseRPCErrorHandler, HRegionInfo hri = rsh.s.getRegionInfo(); // Yes, should be the same instance if (regionServer.getOnlineRegion(hri.getRegionName()) != rsh.r) { - String msg = "Region was re-opened after the scanner" + scannerName + " was created: " - + hri.getRegionNameAsString(); + String msg = "Region has changed on the scanner " + scannerName + ": regionName=" + + hri.getRegionName() + ", scannerRegionName=" + rsh.r; {code} Looks like {{hri.getRegionNameAsString()}} was unintentionally changed to {{hri.getRegionName()}}, [~syuanjiang]/[~stack]? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18224) Upgrate jetty and thrift
Balazs Meszaros created HBASE-18224: --- Summary: Upgrate jetty and thrift Key: HBASE-18224 URL: https://issues.apache.org/jira/browse/HBASE-18224 Project: HBase Issue Type: Sub-task Reporter: Balazs Meszaros Jetty can be updated to 9.4.6 and thrift can be updated to 0.10.0. I tried to update them in HBASE-17898 but some unit tests failed, so created a sub-task for them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-18134) Re-think if the FileSystemUtilizationChore is still necessary
[ https://issues.apache.org/jira/browse/HBASE-18134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser resolved HBASE-18134. Resolution: Won't Fix I did some more thinking about this while working on HBASE-18135. A goal/feature of HBASE-16961 was that when a RegionServer fails to regularly submit size reports for a Region, if a significant percentage of the Regions are missing (e.g. >5% of regions by default), the Master will not enforce a quota violation on the table. This is meant to be a failsafe for regions stuck in transition or generic bugs/flakiness of RegionServers. This feature is implemented by the Master aging off recorded sizes for regions after a given amount of time. As long as the size is (re)reported by a RegionServer, the master continues to acknowledge the size of a Region. If the FileSystemUtilizationChore is removed, the Master will age-off the size reports for regions which are idle but may contain space. This would result in a situation where the Master would stop enforcing a violation policy for a table over quota and not accepting new updates. As such, we cannot implement this improvement while also doing the region size report age-off. My feeling is to avoid the optimization described in this ticket and see what some real-life usage of the feature brings. We have metrics which will help us understand, at scale, what the impact of this chore is. If scanning the Region size on disk is of large impact, we can re-consider. > Re-think if the FileSystemUtilizationChore is still necessary > - > > Key: HBASE-18134 > URL: https://issues.apache.org/jira/browse/HBASE-18134 > Project: HBase > Issue Type: Task >Reporter: Josh Elser >Assignee: Josh Elser > > On the heels of HBASE-18133, we need to put some thought into whether or not > there are cases in which the RegionServer should still report sizes directly > from HDFS. > The cases I have in mind are primarily in the face of RS failure/restart. > Ideally, we could get rid of this chore completely. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Problem with IntegrationTestRegionReplicaReplication
I'd start trying a read_delay_ms=6, region_replication=2, num_keys_per_server=5000, num_regions_per_server=5 with a maybe 10's of reader and writer threads. Again, this can be quite dependent on the kind of hardware you have. You'll definitely have to tweak ;) On 6/15/17 4:44 AM, Peter Somogyi wrote: Thanks Josh and Devaraj! I will try to increase the timeouts. Devaraj, could you share the parameters you used for this test which worked? On Thu, Jun 15, 2017 at 6:44 AM, Devaraj Daswrote: That sounds about right, Josh. Peter, in our internal testing we have seen this test failing and increasing timeouts (look at the test code options to do with increasing timeout) helped quite some. From: Josh Elser Sent: Wednesday, June 14, 2017 3:17 PM To: dev@hbase.apache.org Subject: Re: Problem with IntegrationTestRegionReplicaReplication On 6/14/17 3:53 AM, Peter Somogyi wrote: Hi, As one of my first task with HBase I started to look into why IntegrationTestRegionReplicaReplication fails. I would like to get some suggestions from you. I noticed when I run the test using normal cluster or minicluster I get the same error messages: "Error checking data for key [null], no data returned". I looked into the code and here are my conclusions. There are multiple threads writing data parallel which are read by multiple reader threads simultaneously. Each writer gets a portion of the keys to write (e.g. 0-2000) and these keys are added to a ConstantDelayQueue. The reader threads get the elements (e.g. key=1000) from the queue and these reader threads assume that all the keys up to this are already in the database. Since we're using multiple writers it can happen that another thread has not yet written key=500 and verifying these keys will cause the test failure. Do you think my assumption is correct? Hi Peter, No, as my memory serves, this is not correct. Readers are not made aware of keys to verify until the write occur plus some delay. The delay is used to provide enough time for the internal region replication to take effect. So: primary-write, pause, [region replication happens in background], add updated key to read queue, reader gets key from queue verifies the value on a replica. The primary should always have seen the new value for a key. If the test is showing that a replica does not see the result, it's either a timing issue (you need to give a larger delay for HBase to perform the region replication) or a bug in the region replication framework itself. That said, if you can show that you are seeing what you describe, that sounds like the test framework itself is broken :)
[jira] [Created] (HBASE-18223) Track the effort to improve/bug fix read replica feature
huaxiang sun created HBASE-18223: Summary: Track the effort to improve/bug fix read replica feature Key: HBASE-18223 URL: https://issues.apache.org/jira/browse/HBASE-18223 Project: HBase Issue Type: Task Components: Client Affects Versions: 2.0.0 Reporter: huaxiang sun During the hbasecon 2017, a group of people met and agreed to collaborate the effort to improve/bug fix read replica feature so users can enable this feature in their clusters. This jira is created to track jiras which are known related with read replica feature. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [VOTE] First release candidate for HBase 1.1.11 (RC0) is available
+1 (binding) * No unexpected binaries in source release * L look good * Could build from source * Could run bin-tarball as-is * Checked compat report (thanks for publishing) * xsum/sigs OK * Ran a PE randomwritetest On 6/10/17 7:40 PM, Nick Dimiduk wrote: I'm happy to announce the first release candidate of HBase 1.1.11 (HBase-1.1.11RC0) is available for download at https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.11RC0/ Maven artifacts are also available in the staging repository https://repository.apache.org/content/repositories/orgapachehbase-1170/ Artifacts are signed with my code signing subkey 0xAD9039071C3489BD, available in the Apache keys directory https://people.apache.org/keys/committer/ndimiduk.asc and in our KEYS file http://www-us.apache.org/dist/hbase/KEYS. There's also a signed tag for this release at https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=d318fbeef0990e53efb313c1a459cce73ed4edb8 The detailed source and binary compatibility report vs 1.1.10 has been published for your review, at https://home.apache.org/~ndimiduk/1.1.10_1.1.11RC0_compat_report.html HBase 1.1.11 is the eleventh patch release in the HBase 1.1 line, continuing on the theme of bringing a stable, reliable database to the Hadoop and NoSQL communities. This release includes nearly 20 bug fixes since the 1.1.10 release. Notable correctness fixes include HBASE-17937, HBASE-18036, HBASE-18081, HBASE-18093, HBASE-16011, and HBASE-18066. The full list of fixes included in this release is available at https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753=12340375 and and in the CHANGES.txt file included in the distribution. Let's leave this vote up an extra couple days since HBaseCon is this week. Please try out this candidate and vote +/-1 by 23:59 Pacific time on Sunday, 2017-06-18 as to whether we should release these artifacts as HBase 1.1.11. Thanks, Nick
[jira] [Resolved] (HBASE-18166) [AMv2] We are splitting already-split files
[ https://issues.apache.org/jira/browse/HBASE-18166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-18166. --- Resolution: Fixed Pushed branch-2 and master. > [AMv2] We are splitting already-split files > --- > > Key: HBASE-18166 > URL: https://issues.apache.org/jira/browse/HBASE-18166 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 2.0.0 >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: HBASE-18166.master.001.patch, > HBASE-18166.master.002.patch > > > Interesting issue. The below adds a lag cleaning up files after a compaction > in case of on-going Scanners (for read replicas/offheap). > HBASE-14970 Backport HBASE-13082 and its sub-jira to branch-1 - recommit (Ram) > What the lag means is that now that split is run from the HMaster in master > branch, when it goes to get a listing of the files to split, it can pick up > files that are for archiving but that have not been archived yet. When it > does, it goes ahead and splits them... making references of references. > Its a mess. > I added asking the Region if it is splittable a while back. The Master calls > this from SplitTableRegionProcedure during preparation. If the RegionServer > asked for the split, it is sort of redundant work given the RS asks itself if > any references still; if any, it'll wait before asking for a split. But if a > user/client asks, then this isSplittable over RPC comes in handy. > I was thinking that isSplittable could return list of files > Or, easier, given we know a region is Splittable by the time we go to split > the files, then I think master-side we can just skip any references found > presuming read-for-archive. > Will be back with a patch. Want to test on cluster first (Side-effect is > regions are offline because file at end of the reference to a reference is > removed ... and so the open fails). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18222) Tests fail as lock on table is not released
Anup Halarnkar created HBASE-18222: -- Summary: Tests fail as lock on table is not released Key: HBASE-18222 URL: https://issues.apache.org/jira/browse/HBASE-18222 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 3.0.0 Environment: [INFO] [INFO] Detecting the operating system and CPU architecture [INFO] [INFO] os.detected.name: linux [INFO] os.detected.arch: ppcle_64 [INFO] os.detected.version: 3.16 [INFO] os.detected.version.major: 3 [INFO] os.detected.version.minor: 16 [INFO] os.detected.release: ubuntu [INFO] os.detected.release.version: 14.04 [INFO] os.detected.release.like.ubuntu: true [INFO] os.detected.release.like.debian: true [INFO] os.detected.classifier: linux-ppcle_64 Reporter: Anup Halarnkar Fix For: 3.0.0 *All Failed Tests* 1. org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics.testRegionObserverAfterRegionClosed _org.apache.hadoop.hbase.exceptions.TimeoutIOException: java.util.concurrent.TimeoutException: The procedure 14 is still running at org.apache.hadoop.hbase.client.HBaseAdmin$ProcedureFuture.waitProcedureResult(HBaseAdmin.java:3418) at org.apache.hadoop.hbase.client.HBaseAdmin$ProcedureFuture.get(HBaseAdmin.java:3339) at org.apache.hadoop.hbase.client.HBaseAdmin.get(HBaseAdmin.java:1962) at org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:764) at org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics.testRegionObserverAfterRegionClosed(TestCoprocessorMetrics.java:487) _ 2.org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics.testMasterObserver 3.org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics.testRegionObserverEndpoint 4.org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics.testRegionObserverMultiRegion 5.org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics.testRegionObserverSingleRegion 6.org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics.testWALObserver 7.org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics.testRegionServerObserver 8.org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics.testRegionObserverMultiTable *-* *Stack Traces of all tests from 2 -8 is given below* org.apache.hadoop.hbase.TableNotDisabledException: testRegionObserverAfterRegionClosed at org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:2470) at org.apache.hadoop.hbase.master.procedure.DeleteTableProcedure.prepareDelete(DeleteTableProcedure.java:241) at org.apache.hadoop.hbase.master.procedure.DeleteTableProcedure.executeFromState(DeleteTableProcedure.java:91) at org.apache.hadoop.hbase.master.procedure.DeleteTableProcedure.executeFromState(DeleteTableProcedure.java:58) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:155) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:843) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1373) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1142) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1651) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: Problem with IntegrationTestRegionReplicaReplication
Thanks Josh and Devaraj! I will try to increase the timeouts. Devaraj, could you share the parameters you used for this test which worked? On Thu, Jun 15, 2017 at 6:44 AM, Devaraj Daswrote: > That sounds about right, Josh. Peter, in our internal testing we have seen > this test failing and increasing timeouts (look at the test code options to > do with increasing timeout) helped quite some. > > From: Josh Elser > Sent: Wednesday, June 14, 2017 3:17 PM > To: dev@hbase.apache.org > Subject: Re: Problem with IntegrationTestRegionReplicaReplication > > On 6/14/17 3:53 AM, Peter Somogyi wrote: > > Hi, > > > > As one of my first task with HBase I started to look into > > why IntegrationTestRegionReplicaReplication fails. I would like to get > some > > suggestions from you. > > > > I noticed when I run the test using normal cluster or minicluster I get > the > > same error messages: "Error checking data for key [null], no data > > returned". I looked into the code and here are my conclusions. > > > > There are multiple threads writing data parallel which are read by > multiple > > reader threads simultaneously. Each writer gets a portion of the keys to > > write (e.g. 0-2000) and these keys are added to a ConstantDelayQueue. > > The reader threads get the elements (e.g. key=1000) from the queue and > > these reader threads assume that all the keys up to this are already in > the > > database. Since we're using multiple writers it can happen that another > > thread has not yet written key=500 and verifying these keys will cause > the > > test failure. > > > > Do you think my assumption is correct? > > Hi Peter, > > No, as my memory serves, this is not correct. Readers are not made aware > of keys to verify until the write occur plus some delay. The delay is > used to provide enough time for the internal region replication to take > effect. > > So: primary-write, pause, [region replication happens in background], > add updated key to read queue, reader gets key from queue verifies the > value on a replica. > > The primary should always have seen the new value for a key. If the test > is showing that a replica does not see the result, it's either a timing > issue (you need to give a larger delay for HBase to perform the region > replication) or a bug in the region replication framework itself. That > said, if you can show that you are seeing what you describe, that sounds > like the test framework itself is broken :) > > > >
[jira] [Created] (HBASE-18221) Switch from pread to stream should happen under HStore's reentrant lock
ramkrishna.s.vasudevan created HBASE-18221: -- Summary: Switch from pread to stream should happen under HStore's reentrant lock Key: HBASE-18221 URL: https://issues.apache.org/jira/browse/HBASE-18221 Project: HBase Issue Type: Sub-task Components: Scanners Affects Versions: 2.0.0, 3.0.0, 2.0.0-alpha-1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 3.0.0, 2.0.0-alpha-2 Found this while debugging HBASE-18186. When we try to reopen the scanners on the storefiles while trying to switch over from pread to stream, we do not use the HStore's reentrant lock to get the current Storefiles from the StoreFileManager. All the scan APIs are guarded under that and we must do it here also other wise the CompactedHfileDischarger may cause race issues with the HStore's datastructures like here {code} 2017-06-14 18:16:17,223 WARN [RpcServer.default.FPBQ.Fifo.handler=23,queue=1,port=16020] regionserver.StoreScanner: failed to switch to stream read java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:133) at org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1221) at org.apache.hadoop.hbase.regionserver.StoreScanner.trySwitchToStreamRead(StoreScanner.java:997) at org.apache.hadoop.hbase.regionserver.StoreScanner.shipped(StoreScanner.java:1134) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.shipped(KeyValueHeap.java:445) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.shipped(HRegion.java:6459) at org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run(RSRpcServices.java:339) at org.apache.hadoop.hbase.ipc.ServerCall.setResponse(ServerCall.java:252) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:166) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:278) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:258) {code} I have a working patch fixing this problem. Will do some more testing and try to upload the patch after I write a test case for this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18220) Compaction scanners need not reopen storefile scanners while trying to switch over
ramkrishna.s.vasudevan created HBASE-18220: -- Summary: Compaction scanners need not reopen storefile scanners while trying to switch over Key: HBASE-18220 URL: https://issues.apache.org/jira/browse/HBASE-18220 Project: HBase Issue Type: Sub-task Components: Compaction Affects Versions: 2.0.0, 3.0.0, 2.0.0-alpha-1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0, 3.0.0, 2.0.0-alpha-2 We try switch over to stream scanner if we have read more than a certain number of bytes. In case of compaction we already have stream based scanners only and but on calling shipped() we try to again close and reopen the scanners which is unwanted. [~Apache9] -- This message was sent by Atlassian JIRA (v6.4.14#64029)