[jira] [Updated] (HBASE-5633) NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5633: --- Attachment: HBASE-5633-v1.patch Added a default value (HConstants.DEFAULT_CLUSTER_DISTRIBUTED) for the cluster.distributed property to config.get(HConstants.CLUSTER_DISTRIBUTED) in parseZooCfg(). We get a Null if the property doesn't exists, and the result of conf.get() is compared directly without checking for Null. NPE reading ZK config in HBase -- Key: HBASE-5633 URL: https://issues.apache.org/jira/browse/HBASE-5633 Project: HBase Issue Type: Bug Components: zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-5633-v1.patch If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and cluster.distributed property (in hbase-site.xml) is empty we get an NPE in parseZooCfg(). The easy way to reproduce the bug is running org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing: {code} property namehbase.cluster.distributed/name value/value /property {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5633) NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-5633: --- Status: Patch Available (was: Open) NPE reading ZK config in HBase -- Key: HBASE-5633 URL: https://issues.apache.org/jira/browse/HBASE-5633 Project: HBase Issue Type: Bug Components: zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-5633-v1.patch If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and cluster.distributed property (in hbase-site.xml) is empty we get an NPE in parseZooCfg(). The easy way to reproduce the bug is running org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing: {code} property namehbase.cluster.distributed/name value/value /property {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5633) NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237465#comment-13237465 ] Hadoop QA commented on HBASE-5633: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519785/HBASE-5633-v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1297//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1297//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1297//console This message is automatically generated. NPE reading ZK config in HBase -- Key: HBASE-5633 URL: https://issues.apache.org/jira/browse/HBASE-5633 Project: HBase Issue Type: Bug Components: zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-5633-v1.patch If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and cluster.distributed property (in hbase-site.xml) is empty we get an NPE in parseZooCfg(). The easy way to reproduce the bug is running org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing: {code} property namehbase.cluster.distributed/name value/value /property {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs
[ https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237472#comment-13237472 ] jirapos...@reviews.apache.org commented on HBASE-5451: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4096/#review6302 --- http://svn.apache.org/repos/asf/hbase/trunk/pom.xml https://reviews.apache.org/r/4096/#comment13673 Trailing whitespaces. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java https://reviews.apache.org/r/4096/#comment13674 You already import RpcRequestWithHeaderProto, so just use RpcRequestWithHeaderProto.Builder here, drop the leading RPCMessageProtos. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java https://reviews.apache.org/r/4096/#comment13675 Trailing whitespaces here and below. Kill them all. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java https://reviews.apache.org/r/4096/#comment13677 If you cast it to RpcRequestProto, then why not check if param is an instance of RpcRequestProto? Also you're missing a space right before param. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java https://reviews.apache.org/r/4096/#comment13678 Throw a ClassCastException. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java https://reviews.apache.org/r/4096/#comment13679 Argh, no, don't change this! I got other HBase devs to promise to not change this as it makes backwards compatible clients impossibly complicated. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java https://reviews.apache.org/r/4096/#comment13680 Trailing whitespace. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java https://reviews.apache.org/r/4096/#comment13739 Is there a way to avoid code duplication and unify this method with the on in the HBaseClient class? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java https://reviews.apache.org/r/4096/#comment13740 Why wrap this line and the next around? I think this fits on one line without exceeding 80 columns. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java https://reviews.apache.org/r/4096/#comment13670 just do return create(null) ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java https://reviews.apache.org/r/4096/#comment13671 Why use the fully qualified names here? Also kill the trailing whitespace. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java https://reviews.apache.org/r/4096/#comment13672 Trailing whitespaces. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java https://reviews.apache.org/r/4096/#comment13741 This seems unnecessary. http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto https://reviews.apache.org/r/4096/#comment13742 Kill all the trailing whitespaces! http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto https://reviews.apache.org/r/4096/#comment13743 I don't see how this is graceful. http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto https://reviews.apache.org/r/4096/#comment13744 Why keep this oddity of Hadoop RPC? Either rely on TCP keepalive, or add a Ping method to the RPC interface. http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto https://reviews.apache.org/r/4096/#comment13745 What's the point of this message? Why not just put the callId in RpcRequestProto and be done with it? http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto https://reviews.apache.org/r/4096/#comment13746 Why is this optional? http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto https://reviews.apache.org/r/4096/#comment13747 Why is this optional? It should be required and it should be first. http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto https://reviews.apache.org/r/4096/#comment13748 Ditto, why have an extra PB? http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto https://reviews.apache.org/r/4096/#comment13749 This should be first and it should be required.
[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems
[ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237474#comment-13237474 ] Jonathan Hsieh commented on HBASE-5128: --- Thanks Ted. I've updated the rest. Will do better next time. :) [uber hbck] Online automated repair of table integrity and region consistency problems -- Key: HBASE-5128 URL: https://issues.apache.org/jira/browse/HBASE-5128 Project: HBase Issue Type: New Feature Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue. Here's the approach (from the comment of at the top of the new version of the file). {code} /** * HBaseFsck (hbck) is a tool for checking and repairing region consistency and * table integrity. * * Region consistency checks verify that META, region deployment on * region servers and the state of data in HDFS (.regioninfo files) all are in * accordance. * * Table integrity checks verify that that all possible row keys can resolve to * exactly one region of a table. This means there are no individual degenerate * or backwards regions; no holes between regions; and that there no overlapping * regions. * * The general repair strategy works in these steps. * 1) Repair Table Integrity on HDFS. (merge or fabricate regions) * 2) Repair Region Consistency with META and assignments * * For table integrity repairs, the tables their region directories are scanned * for .regioninfo files. Each table's integrity is then verified. If there * are any orphan regions (regions with no .regioninfo files), or holes, new * regions are fabricated. Backwards regions are sidelined as well as empty * degenerate (endkey==startkey) regions. If there are any overlapping regions, * a new region is created and all data is merged into the new region. * * Table integrity repairs deal solely with HDFS and can be done offline -- the * hbase region servers or master do not need to be running. These phase can be * use to completely reconstruct the META table in an offline fashion. * * Region consistency requires three conditions -- 1) valid .regioninfo file * present in an hdfs region dir, 2) valid row with .regioninfo data in META, * and 3) a region is deployed only at the regionserver that is was assigned to. * * Region consistency requires hbck to contact the HBase master and region * servers, so the connect() must first be called successfully. Much of the * region consistency information is transient and less risky to repair. */ {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5563) HRegionInfo#compareTo should compare regionId as well
[ https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5563: -- Fix Version/s: 0.90.7 HRegionInfo#compareTo should compare regionId as well - Key: HBASE-5563 URL: https://issues.apache.org/jira/browse/HBASE-5563 Project: HBase Issue Type: Bug Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5563.patch, HBASE-5563v2.patch, HBASE-5563v2.patch, hbase-5563-0.90.patch, hbase-5563-v3-0.92.patch, hbase-5563-v3.patch In the one region multi assigned case, we could find that two regions have the same table name, same startKey, same endKey, and different regionId, so these two regions are same in TreeMap but different in HashMap. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never do balance becauseof balance the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237484#comment-13237484 ] gaojinchao commented on HBASE-5615: --- +1 the master never do balance becauseof balance the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Attachments: HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems
[ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237490#comment-13237490 ] Hudson commented on HBASE-5128: --- Integrated in HBase-0.94 #51 (See [https://builds.apache.org/job/HBase-0.94/51/]) HBASE-5128 Addendum adds two missing new files (Revision 1304722) Result = FAILURE jmhsieh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java [uber hbck] Online automated repair of table integrity and region consistency problems -- Key: HBASE-5128 URL: https://issues.apache.org/jira/browse/HBASE-5128 Project: HBase Issue Type: New Feature Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue. Here's the approach (from the comment of at the top of the new version of the file). {code} /** * HBaseFsck (hbck) is a tool for checking and repairing region consistency and * table integrity. * * Region consistency checks verify that META, region deployment on * region servers and the state of data in HDFS (.regioninfo files) all are in * accordance. * * Table integrity checks verify that that all possible row keys can resolve to * exactly one region of a table. This means there are no individual degenerate * or backwards regions; no holes between regions; and that there no overlapping * regions. * * The general repair strategy works in these steps. * 1) Repair Table Integrity on HDFS. (merge or fabricate regions) * 2) Repair Region Consistency with META and assignments * * For table integrity repairs, the tables their region directories are scanned * for .regioninfo files. Each table's integrity is then verified. If there * are any orphan regions (regions with no .regioninfo files), or holes, new * regions are fabricated. Backwards regions are sidelined as well as empty * degenerate (endkey==startkey) regions. If there are any overlapping regions, * a new region is created and all data is merged into the new region. * * Table integrity repairs deal solely with HDFS and can be done offline -- the * hbase region servers or master do not need to be running. These phase can be * use to completely reconstruct the META table in an offline fashion. * * Region consistency requires three conditions -- 1) valid .regioninfo file * present in an hdfs region dir, 2) valid row with .regioninfo data in META, * and 3) a region is deployed only at the regionserver that is was assigned to. * * Region consistency requires hbck to contact the HBase master and region * servers, so the connect() must first be called successfully. Much of the * region consistency information is transient and less risky to repair. */ {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems
[ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237493#comment-13237493 ] Hudson commented on HBASE-5128: --- Integrated in HBase-0.92 #338 (See [https://builds.apache.org/job/HBase-0.92/338/]) HBASE-5128 Addendum adds two missing new files (Revision 1304723) Result = FAILURE jmhsieh : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java [uber hbck] Online automated repair of table integrity and region consistency problems -- Key: HBASE-5128 URL: https://issues.apache.org/jira/browse/HBASE-5128 Project: HBase Issue Type: New Feature Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue. Here's the approach (from the comment of at the top of the new version of the file). {code} /** * HBaseFsck (hbck) is a tool for checking and repairing region consistency and * table integrity. * * Region consistency checks verify that META, region deployment on * region servers and the state of data in HDFS (.regioninfo files) all are in * accordance. * * Table integrity checks verify that that all possible row keys can resolve to * exactly one region of a table. This means there are no individual degenerate * or backwards regions; no holes between regions; and that there no overlapping * regions. * * The general repair strategy works in these steps. * 1) Repair Table Integrity on HDFS. (merge or fabricate regions) * 2) Repair Region Consistency with META and assignments * * For table integrity repairs, the tables their region directories are scanned * for .regioninfo files. Each table's integrity is then verified. If there * are any orphan regions (regions with no .regioninfo files), or holes, new * regions are fabricated. Backwards regions are sidelined as well as empty * degenerate (endkey==startkey) regions. If there are any overlapping regions, * a new region is created and all data is merged into the new region. * * Table integrity repairs deal solely with HDFS and can be done offline -- the * hbase region servers or master do not need to be running. These phase can be * use to completely reconstruct the META table in an offline fashion. * * Region consistency requires three conditions -- 1) valid .regioninfo file * present in an hdfs region dir, 2) valid row with .regioninfo data in META, * and 3) a region is deployed only at the regionserver that is was assigned to. * * Region consistency requires hbck to contact the HBase master and region * servers, so the connect() must first be called successfully. Much of the * region consistency information is transient and less risky to repair. */ {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems
[ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237531#comment-13237531 ] Hudson commented on HBASE-5128: --- Integrated in HBase-TRUNK #2694 (See [https://builds.apache.org/job/HBase-TRUNK/2694/]) HBASE-5128 Addendum adds two new files Jon forgot to add (Revision 1304702) HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304665) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java jmhsieh : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java [uber hbck] Online automated repair of table integrity and region consistency problems -- Key: HBASE-5128 URL: https://issues.apache.org/jira/browse/HBASE-5128 Project: HBase Issue Type: New Feature Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue. Here's the approach (from the comment of at the top of the new version of the file). {code} /** * HBaseFsck (hbck) is a tool for checking and repairing region consistency and * table integrity. * * Region consistency checks verify that META, region deployment on * region servers and the state of data in HDFS (.regioninfo files) all are in * accordance. * * Table integrity checks verify that that all possible row keys can resolve to * exactly one region of a table. This means there are no individual degenerate * or backwards regions; no holes between regions; and that there no overlapping * regions. * * The general repair strategy works in these steps. * 1) Repair Table Integrity on HDFS. (merge or fabricate regions) * 2) Repair Region Consistency with META and assignments * * For table integrity repairs, the tables their region directories are scanned * for .regioninfo files. Each table's integrity is then verified. If there * are any orphan regions (regions with no .regioninfo files), or holes, new * regions are fabricated. Backwards regions are sidelined as well as empty * degenerate (endkey==startkey) regions. If there are any overlapping regions, * a new region is created and all data is merged into the new region. * * Table integrity repairs deal solely with HDFS and can be done offline -- the * hbase region servers or master do not need to be running. These phase can be * use to completely reconstruct the META table in an offline fashion. * * Region consistency requires three conditions -- 1) valid .regioninfo file * present in an hdfs region dir,
[jira] [Commented] (HBASE-5616) Make compaction code standalone
[ https://issues.apache.org/jira/browse/HBASE-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237532#comment-13237532 ] Hudson commented on HBASE-5616: --- Integrated in HBase-TRUNK #2694 (See [https://builds.apache.org/job/HBase-TRUNK/2694/]) HBASE-5616 Make compaction code standalone; ADDENDUM -- ADD LICENSES (Revision 1304624) HBASE-5616 Make compaction code standalone (Revision 1304616) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Compactor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/CompactionTool.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Compactor.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionProgress.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/ChecksumType.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/CompactionTool.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java Make compaction code standalone --- Key: HBASE-5616 URL: https://issues.apache.org/jira/browse/HBASE-5616 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Fix For: 0.96.0 Attachments: 5616.txt, 5616v3.txt, 5616v6.txt, 5616v7.txt, 5616v7.txt, addlicense.txt, standalone.txt This is part of hbase-2462. Make the compaction code standalone so can run it independent of hbase. Will make it easier to profile and try stuff out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5190) Limit the IPC queue size based on calls' payload size
[ https://issues.apache.org/jira/browse/HBASE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237529#comment-13237529 ] Hudson commented on HBASE-5190: --- Integrated in HBase-TRUNK #2694 (See [https://builds.apache.org/job/HBase-TRUNK/2694/]) HBASE-5190 Limit the IPC queue size based on calls' payload size (Revision 1304634) Result = SUCCESS jdcryans : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java Limit the IPC queue size based on calls' payload size - Key: HBASE-5190 URL: https://issues.apache.org/jira/browse/HBASE-5190 Project: HBase Issue Type: Improvement Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5190-v2.patch, HBASE-5190-v3.patch, HBASE-5190.patch Currently we limit the number of calls in the IPC queue only on their count. It used to be really high and was dropped down recently to num_handlers * 10 (so 100 by default) because it was easy to OOME yourself when huge calls were being queued. It's still possible to hit this problem if you use really big values and/or a lot of handlers, so the idea is that we should take into account the payload size. I can see 3 solutions: - Do the accounting outside of the queue itself for all calls coming in and out and when a call doesn't fit, throw a retryable exception. - Same accounting but instead block the call when it comes in until space is made available. - Add a new parameter for the maximum size (in bytes) of a Call and then set the size the IPC queue (in terms of the number of items) so that it could only contain as many items as some predefined maximum size (in bytes) for the whole queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5469) Add baseline compression efficiency to DataBlockEncodingTool
[ https://issues.apache.org/jira/browse/HBASE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237530#comment-13237530 ] Hudson commented on HBASE-5469: --- Integrated in HBase-TRUNK #2694 (See [https://builds.apache.org/job/HBase-TRUNK/2694/]) [jira] [HBASE-5469] Add baseline compression efficiency to DataBlockEncodingTool Summary: DataBlockEncodingTool currently does not provide baseline compression efficiency, e.g. Hadoop compression codec applied to unencoded data. E.g. if we are using LZO to compress blocks, we would like to have the following columns in the report (possibly as percentages of raw data size). Baseline K+V in blockcache | Baseline K + V on disk (LZO compressed) | K + V DataBlockEncoded in block cache | K + V DataBlockEncoded + LZOCompressed (on disk) Background: we never store compressed blocks in cache, but we always store encoded data blocks in cache if data block encoding is enabled for the column family. This patch also has multiple bugfixes and improvements to DataBlockEncodingTool, including presentation format, memory requirements (reduced 3x) and fixing the handling of compression. Test Plan: * Run unit tests. * Run DataBlockEncodingTool on a variety of real-world HFiles. Reviewers: JIRA, dhruba, tedyu, stack, heyongqiang Reviewed By: tedyu Differential Revision: https://reviews.facebook.net/D2409 (Revision 1304626) Result = SUCCESS mbautin : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/DataBlockEncodingTool.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/EncodedSeekPerformanceTest.java Add baseline compression efficiency to DataBlockEncodingTool Key: HBASE-5469 URL: https://issues.apache.org/jira/browse/HBASE-5469 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D2409.1.patch, D2409.2.patch, jira-HBASE-5469-Add-baseline-compression-efficiency--2012-03-23_15_04_41.patch DataBlockEncodingTool currently does not provide baseline compression efficiency, e.g. Hadoop compression codec applied to unencoded data. E.g. if we are using LZO to compress blocks, we would like to have the following columns in the report (possibly as percentages of raw data size). Baseline K+V in blockcache | Baseline K + V on disk (LZO compressed) | K + V DataBlockEncoded in block cache | K + V DataBlockEncoded + LZOCompressed (on disk) Background: we never store compressed blocks in cache, but we always store encoded data blocks in cache if data block encoding is enabled for the column family. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5604) HLog replay tool that generates HFiles for use by LoadIncrementalHFiles.
[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237543#comment-13237543 ] Jonathan Hsieh commented on HBASE-5604: --- Hm, so this is similar to this http://www.postgresql.org/docs/8.2/static/continuous-archiving.html? For hbase, this would to enable a consistent warm backup (though no strong guarantees across regions) that would be cheaper than the full replication mechanism? HLog replay tool that generates HFiles for use by LoadIncrementalHFiles. Key: HBASE-5604 URL: https://issues.apache.org/jira/browse/HBASE-5604 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Just an idea I had. Might be useful for restore of a backup using the HLogs. This could an M/R (with a mapper per HLog file). The tool would get a timerange and a (set of) table(s). We'd pick the right HLogs based on time before the M/R job is started and then have a mapper per HLog file. The mapper would then go through the HLog, filter all WALEdits that didn't fit into the time range or are not any of the tables and then uses HFileOutputFormat to generate HFiles. Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits
[ https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5618: -- Component/s: zookeeper wal SplitLogManager - prevent unnecessary attempts to resubmits --- Key: HBASE-5618 URL: https://issues.apache.org/jira/browse/HBASE-5618 Project: HBase Issue Type: Improvement Components: wal, zookeeper Reporter: Prakash Khemani Currently once a watch fires that the task node has been updated (hearbeated) by the worker, the splitlogmanager still quite some time before it updates the last heard from time. This is because the manager currently schedules another getDataSetWatch() and only after that finishes will it update the task's last heard from time. This leads to a large number of zk-BadVersion warnings when resubmission is continuously attempted and it fails. Two changes should be made (1) On a resubmission failure because of BadVersion the task's lastUpdate time should get upped. (2) The task's lastUpdate time should get upped as soon as the nodeDataChanged() watch fires and without waiting for getDataSetWatch() to complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4910) thrift scannerstopwithfilter not honoring stop row
[ https://issues.apache.org/jira/browse/HBASE-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-4910: -- Component/s: thrift thrift scannerstopwithfilter not honoring stop row -- Key: HBASE-4910 URL: https://issues.apache.org/jira/browse/HBASE-4910 Project: HBase Issue Type: Sub-task Components: client, regionserver, thrift Reporter: Nicolas Spiegelberg Fix For: 0.96.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5599) The hbkc tool can not fix the six scenarios, it is NO_VERSION_FILE, NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED, FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_
[ https://issues.apache.org/jira/browse/HBASE-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237554#comment-13237554 ] Jonathan Hsieh commented on HBASE-5599: --- Hi Fulin, I've committed HBASE-5128 so let's work together to port the functionality you've implemented into it as well. * Some of the methods you've added in HBaseFsckRepair are similar in HBASE-5128. Let's consolidate. * It seems that you are focused on the 0.90 versions -- it is important that whatever changes we make also make it into the newer 0.92/0.94/trunk versions. I can definitely help out there. * It is important to add some testing for the new scenarios as well -- checkout TestHbckFsck for some example to emulate. I've basically started to have have one test for each error condition. The hbkc tool can not fix the six scenarios, it is NO_VERSION_FILE, NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED, FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_IN_REGION_CHAIN. Key: HBASE-5599 URL: https://issues.apache.org/jira/browse/HBASE-5599 Project: HBase Issue Type: New Feature Components: hbck Affects Versions: 0.90.6 Reporter: fulin wang Fix For: 0.90.6 Attachments: hbase-5599-0.90.patch, hbase-5599-0.90_v2.patch, hbase-5599-0.90_v3.patch The hbck tool can not fix the six scenarios. 1. Version file does not exist in root dir. Fix: I try to create a version file by 'FSUtils.setVersion' method. 2. [REGIONNAME][KEY] on HDFS, but not listed in META or deployed on any region server. Fix: I get region info form the hdfs file, this region info write to '.META.' table. 3. [REGIONNAME][KEY] not in META, but deployed on [SERVERNAME] Fix: I get region info form the hdfs file, this region info write to '.META.' table. 4. [REGIONNAME] should not be deployed according to META, but is deployed on [SERVERNAME] Fix: Close this region. 5. First region should start with an empty key. You need to create a new region and regioninfo in HDFS to plug the hole. Fix: The region info is not in hdfs and .META., so it create a empty region for this error. 6. There is a hole in the region chain between [KEY] and [KEY]. You need to create a new regioninfo and region dir in hdfs to plug the hole. Fix: The region info is not in hdfs and .META., so it create a empty region for this hole. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5599) The hbck tool can not fix the six scenarios, it is NO_VERSION_FILE, NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED, FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_IN
[ https://issues.apache.org/jira/browse/HBASE-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5599: -- Summary: The hbck tool can not fix the six scenarios, it is NO_VERSION_FILE, NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED, FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_IN_REGION_CHAIN. (was: The hbkc tool can not fix the six scenarios, it is NO_VERSION_FILE, NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED, FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_IN_REGION_CHAIN.) The hbck tool can not fix the six scenarios, it is NO_VERSION_FILE, NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED, FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_IN_REGION_CHAIN. Key: HBASE-5599 URL: https://issues.apache.org/jira/browse/HBASE-5599 Project: HBase Issue Type: New Feature Components: hbck Affects Versions: 0.90.6 Reporter: fulin wang Fix For: 0.90.6 Attachments: hbase-5599-0.90.patch, hbase-5599-0.90_v2.patch, hbase-5599-0.90_v3.patch The hbck tool can not fix the six scenarios. 1. Version file does not exist in root dir. Fix: I try to create a version file by 'FSUtils.setVersion' method. 2. [REGIONNAME][KEY] on HDFS, but not listed in META or deployed on any region server. Fix: I get region info form the hdfs file, this region info write to '.META.' table. 3. [REGIONNAME][KEY] not in META, but deployed on [SERVERNAME] Fix: I get region info form the hdfs file, this region info write to '.META.' table. 4. [REGIONNAME] should not be deployed according to META, but is deployed on [SERVERNAME] Fix: Close this region. 5. First region should start with an empty key. You need to create a new region and regioninfo in HDFS to plug the hole. Fix: The region info is not in hdfs and .META., so it create a empty region for this error. 6. There is a hole in the region chain between [KEY] and [KEY]. You need to create a new regioninfo and region dir in hdfs to plug the hole. Fix: The region info is not in hdfs and .META., so it create a empty region for this hole. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237556#comment-13237556 ] Ted Yu commented on HBASE-5615: --- Integrated to 0.90 branch. Thanks for the patch Xufeng. Thanks for the review Ramkrishna and Jinchao. Patch for TRUNK to follow the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5634) document how to use uberhbck
document how to use uberhbck Key: HBASE-5634 URL: https://issues.apache.org/jira/browse/HBASE-5634 Project: HBase Issue Type: Improvement Components: documentation, hbck Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh The updated hbck from HBASE-5128 introduces many new repair options and, as a side effect, offers many new opportunities to durably shoot oneself in the foot. Docs need to be written and added to the ref guide to explain its usage and ramifications and discuss repair strategies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-5615: -- Attachment: 5615-trunk.txt the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-5615: -- Fix Version/s: 0.96.0 Status: Patch Available (was: Open) the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3852) ThriftServer leaks scanners
[ https://issues.apache.org/jira/browse/HBASE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-3852: -- Component/s: thrift ThriftServer leaks scanners --- Key: HBASE-3852 URL: https://issues.apache.org/jira/browse/HBASE-3852 Project: HBase Issue Type: Bug Components: thrift Affects Versions: 0.90.2 Reporter: Jean-Daniel Cryans Priority: Critical Fix For: 0.94.1 Attachments: 3852.txt, thrift2-scanner.patch The scannerMap in ThriftServer relies on the user to clean it by closing the scanner. If that doesn't happen, the ResultScanner will stay in the thrift server's memory and if any pre-fetching was done, it will also start accumulating Results (with all their data). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5466) Opening a table also opens the metatable and never closes it.
[ https://issues.apache.org/jira/browse/HBASE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5466: -- Fix Version/s: 0.94.0 This happend pre-0.94 branch, added fixver that this is in 0.94. Opening a table also opens the metatable and never closes it. - Key: HBASE-5466 URL: https://issues.apache.org/jira/browse/HBASE-5466 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.5, 0.92.0 Reporter: Ashley Taylor Assignee: Ashley Taylor Fix For: 0.90.7, 0.92.1, 0.94.0 Attachments: MetaScanner_HBASE_5466(2).patch, MetaScanner_HBASE_5466(3).patch, MetaScanner_HBASE_5466.patch Having upgraded to CDH3U3 version of hbase we found we had a zookeeper connection leak, tracking it down we found that closing the connection will only close the zookeeper connection if all calls to get the connection have been closed, there is incCount and decCount in the HConnection class, When a table is opened it makes a call to the metascanner class which opens a connection to the meta table, this table never gets closed. This caused the count in the HConnection class to never return to zero meaning that the zookeeper connection will not close when we close all the tables or calling HConnectionManager.deleteConnection(config, true); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5586) [replication] NPE in ReplicationSource when creating a stream to an inexistent cluster
[ https://issues.apache.org/jira/browse/HBASE-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5586: -- Component/s: replication [replication] NPE in ReplicationSource when creating a stream to an inexistent cluster -- Key: HBASE-5586 URL: https://issues.apache.org/jira/browse/HBASE-5586 Project: HBase Issue Type: Bug Components: replication Affects Versions: 0.90.5 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5586-v2.txt, HBASE-5586-trunk.patch, HBASE-5586.java, HBASE-5586.java This is from 0.92.1-ish: {noformat} 2012-03-15 09:52:16,589 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unexpected exception in ReplicationSource, currentPath=null java.lang.NullPointerException at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:223) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.connectToPeers(ReplicationSource.java:442) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:246) {noformat} I wanted to add a replication stream to a cluster that wasn't existing yet so that the logs would be buffered until then. This should just be treated as if there was no region servers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region
[ https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237581#comment-13237581 ] Hadoop QA commented on HBASE-5615: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519805/5615-trunk.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1298//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1298//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1298//console This message is automatically generated. the master never does balance because of balancing the parent region Key: HBASE-5615 URL: https://issues.apache.org/jira/browse/HBASE-5615 Project: HBase Issue Type: Bug Affects Versions: 0.90.7 Reporter: xufeng Assignee: xufeng Priority: Critical Fix For: 0.90.7, 0.96.0 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html the master never do balance becauseof when master do rebuildUserRegions(),it will add the parent region into AssignmentManager#servers, if balancer let the parent region to move,the parent will in RIT forever.thus balance will never be executed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237582#comment-13237582 ] Jonathan Hsieh commented on HBASE-5598: --- Currently we are somewhere around the 770 warnings/errors mark. We should chop this into subtasks to break down the work and knock out related issues. For this to last, once this we get the findbugs warnings to 0, committers need to enforce a no-new-findbugs errors policy on reviews. Agreed? Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Priority: Minor There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-4393: --- Attachment: HBaseCanary.java I've attached a simple draft canary tool, that foreach table (or for the specified tables) tries to fetch a row from each region server, collects and print failures and times. should this tool be a service that collect/expose stats for each region/column family or just a tool to get an idea on the cluster state? In case this should be just a tool, any idea on the output format, the metrics that we want collect and output? Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Amandeep Khurana Attachments: HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost
[ https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237647#comment-13237647 ] Jimmy Xiang commented on HBASE-5606: This is similar issue as HBASE-5081, right? Will my original fix proposed for HBASE-5081 help: don't retry distributed log splitting before tasks are actually deleted? We can abort the master after several retry to delete the tasks. SplitLogManger async delete node hangs log splitting when ZK connection is lost Key: HBASE-5606 URL: https://issues.apache.org/jira/browse/HBASE-5606 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.92.0 Reporter: Gopinathan A Priority: Critical Fix For: 0.92.2 Attachments: 5606.txt 1. One rs died, the servershutdownhandler found it out and started the distributed log splitting; 2. All tasks are failed due to ZK connection lost, so the all the tasks were deleted asynchronously; 3. Servershutdownhandler retried the log splitting; 4. The asynchronously deletion in step 2 finally happened for new task 5. This made the SplitLogManger in hanging state. This leads to .META. region not assigened for long time {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up splitlog task at znode /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} {noformat} hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 19:34:31,196 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 19:34:32,497 DEBUG org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237658#comment-13237658 ] Lars Hofhansl commented on HBASE-5623: -- Was going by the assumption that an IOException here is actually not bad. Either the writer was concurrently closed (which means it was rolled as well), or any persistent HDFS problem will detected on next write attempt. Could LOG.debug something like: Informational: Log roll failed. Will be retried. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter().
[jira] [Updated] (HBASE-5613) ThriftServer getTableRegions does not return serverName and port
[ https://issues.apache.org/jira/browse/HBASE-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5613: - Resolution: Fixed Status: Resolved (was: Patch Available) ThriftServer getTableRegions does not return serverName and port Key: HBASE-5613 URL: https://issues.apache.org/jira/browse/HBASE-5613 Project: HBase Issue Type: Bug Components: thrift Reporter: Scott Chen Assignee: Scott Chen Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5613.0.94.2.txt, HBASE-5613.0.94.txt, HBASE-5613.D2403.1.patch, HBASE-5613.D2403.2.patch, HBASE-5613.D2403.3.patch, HBASE-5613.D2403.4.patch, HBASE-5613.D2403.5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4957) Clean up some log messages, code in RecoverableZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4957: - Status: Patch Available (was: Open) Clean up some log messages, code in RecoverableZooKeeper Key: HBASE-4957 URL: https://issues.apache.org/jira/browse/HBASE-4957 Project: HBase Issue Type: Improvement Components: zookeeper Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.94.0 Attachments: hbase-4957.txt, hbase-4957.txt, hbase-4957.txt, hbase-4957.txt In RecoverableZooKeeper, there are a number of log messages and comments which don't really read correctly, and some other pieces of code that can be cleaned up. Simple cleanup - shouldn't be any actual behavioral changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server
[ https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4720: - Fix Version/s: (was: 0.94.0) 0.94.1 Let's try for 0.94.1 Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server Key: HBASE-4720 URL: https://issues.apache.org/jira/browse/HBASE-4720 Project: HBase Issue Type: Improvement Reporter: Daniel Lord Assignee: Mubarak Seyed Fix For: 0.94.1 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, HBASE-4720.trunk.v5.patch, HBASE-4720.trunk.v6.patch, HBASE-4720.trunk.v7.patch, HBASE-4720.v1.patch, HBASE-4720.v3.patch I have several large application/HBase clusters where an application node will occasionally need to talk to HBase from a different cluster. In order to help ensure some of my consistency guarantees I have a sentinel table that is updated atomically as users interact with the system. This works quite well for the regular hbase client but the REST client does not implement the checkAndPut and checkAndDelete operations. This exposes the application to some race conditions that have to be worked around. It would be ideal if the same checkAndPut/checkAndDelete operations could be supported by the REST client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237668#comment-13237668 ] stack commented on HBASE-5598: -- Propose up on dev list I'd say Jon. More folks will see your question. Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Priority: Minor There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems
[ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237669#comment-13237669 ] stack commented on HBASE-5128: -- Hurray Would suggest you stick something in the release note section Jon as means of spreading the good news about this fat tool. What about this section in the reference manual: http://hbase.apache.org/book.html#hbck Should we update it some? Good stuff [uber hbck] Online automated repair of table integrity and region consistency problems -- Key: HBASE-5128 URL: https://issues.apache.org/jira/browse/HBASE-5128 Project: HBase Issue Type: New Feature Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue. Here's the approach (from the comment of at the top of the new version of the file). {code} /** * HBaseFsck (hbck) is a tool for checking and repairing region consistency and * table integrity. * * Region consistency checks verify that META, region deployment on * region servers and the state of data in HDFS (.regioninfo files) all are in * accordance. * * Table integrity checks verify that that all possible row keys can resolve to * exactly one region of a table. This means there are no individual degenerate * or backwards regions; no holes between regions; and that there no overlapping * regions. * * The general repair strategy works in these steps. * 1) Repair Table Integrity on HDFS. (merge or fabricate regions) * 2) Repair Region Consistency with META and assignments * * For table integrity repairs, the tables their region directories are scanned * for .regioninfo files. Each table's integrity is then verified. If there * are any orphan regions (regions with no .regioninfo files), or holes, new * regions are fabricated. Backwards regions are sidelined as well as empty * degenerate (endkey==startkey) regions. If there are any overlapping regions, * a new region is created and all data is merged into the new region. * * Table integrity repairs deal solely with HDFS and can be done offline -- the * hbase region servers or master do not need to be running. These phase can be * use to completely reconstruct the META table in an offline fashion. * * Region consistency requires three conditions -- 1) valid .regioninfo file * present in an hdfs region dir, 2) valid row with .regioninfo data in META, * and 3) a region is deployed only at the regionserver that is was assigned to. * * Region consistency requires hbck to contact the HBase master and region * servers, so the connect() must first be called successfully. Much of the * region consistency information is transient and less risky to repair. */ {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-5434: - Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.94 and 0.96. [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5434.trunk.v1.patch, HBASE-5434.trunk.v2.patch, HBASE-5434.trunk.v2.patch, HBASE-5434.trunk.v2.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5128) [uber hbck] Online automated repair of table integrity and region consistency problems
[ https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237675#comment-13237675 ] Lars Hofhansl commented on HBASE-5128: -- Thanks for getting this done for 0.94, Jon! +1 on release notes and book update, but doesn't need to hold up 0.94rc [uber hbck] Online automated repair of table integrity and region consistency problems -- Key: HBASE-5128 URL: https://issues.apache.org/jira/browse/HBASE-5128 Project: HBase Issue Type: New Feature Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: 5128-trunk.addendum, hbase-5128-0.90-v2.patch, hbase-5128-0.90-v2b.patch, hbase-5128-0.90-v4.patch, hbase-5128-0.92-v2.patch, hbase-5128-0.92-v4.patch, hbase-5128-0.94-v2.patch, hbase-5128-0.94-v4.patch, hbase-5128-trunk-v2.patch, hbase-5128-trunk.patch, hbase-5128-v3.patch, hbase-5128-v4.patch The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue. Here's the approach (from the comment of at the top of the new version of the file). {code} /** * HBaseFsck (hbck) is a tool for checking and repairing region consistency and * table integrity. * * Region consistency checks verify that META, region deployment on * region servers and the state of data in HDFS (.regioninfo files) all are in * accordance. * * Table integrity checks verify that that all possible row keys can resolve to * exactly one region of a table. This means there are no individual degenerate * or backwards regions; no holes between regions; and that there no overlapping * regions. * * The general repair strategy works in these steps. * 1) Repair Table Integrity on HDFS. (merge or fabricate regions) * 2) Repair Region Consistency with META and assignments * * For table integrity repairs, the tables their region directories are scanned * for .regioninfo files. Each table's integrity is then verified. If there * are any orphan regions (regions with no .regioninfo files), or holes, new * regions are fabricated. Backwards regions are sidelined as well as empty * degenerate (endkey==startkey) regions. If there are any overlapping regions, * a new region is created and all data is merged into the new region. * * Table integrity repairs deal solely with HDFS and can be done offline -- the * hbase region servers or master do not need to be running. These phase can be * use to completely reconstruct the META table in an offline fashion. * * Region consistency requires three conditions -- 1) valid .regioninfo file * present in an hdfs region dir, 2) valid row with .regioninfo data in META, * and 3) a region is deployed only at the regionserver that is was assigned to. * * Region consistency requires hbck to contact the HBase master and region * servers, so the connect() must first be called successfully. Much of the * region consistency information is transient and less risky to repair. */ {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237677#comment-13237677 ] stack commented on HBASE-5434: -- +1 on commit [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5434.trunk.v1.patch, HBASE-5434.trunk.v2.patch, HBASE-5434.trunk.v2.patch, HBASE-5434.trunk.v2.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237678#comment-13237678 ] Lars Hofhansl commented on HBASE-4393: -- @Matteo: Ideally this could be used for trending. So output that is suitable for Ganglia or OpenTSDB (whatever that means in both cases) would be cool. Even just a cluster state tool is great. Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Amandeep Khurana Attachments: HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5633) NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5633: - Attachment: HBASE-5633-v2.patch What I'm committing... wraps a very long line else what Matteo suppied. NPE reading ZK config in HBase -- Key: HBASE-5633 URL: https://issues.apache.org/jira/browse/HBASE-5633 Project: HBase Issue Type: Bug Components: zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-5633-v1.patch, HBASE-5633-v2.patch If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and cluster.distributed property (in hbase-site.xml) is empty we get an NPE in parseZooCfg(). The easy way to reproduce the bug is running org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing: {code} property namehbase.cluster.distributed/name value/value /property {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237681#comment-13237681 ] Lars Hofhansl commented on HBASE-4393: -- Java code looks great. Maybe instead of using a scanner in sniffRegion, you could use a Get? Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Amandeep Khurana Fix For: 0.94.0 Attachments: HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4393: - Fix Version/s: 0.94.0 I would like to get this into 0.94. This needs some of usage description so that folks can find out what you are supposed to pass on the command line. Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Amandeep Khurana Fix For: 0.94.0 Attachments: HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5633) NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5633: - Resolution: Fixed Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed 0.94 branch and trunk NPE reading ZK config in HBase -- Key: HBASE-5633 URL: https://issues.apache.org/jira/browse/HBASE-5633 Project: HBase Issue Type: Bug Components: zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5633-v1.patch, HBASE-5633-v2.patch If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and cluster.distributed property (in hbase-site.xml) is empty we get an NPE in parseZooCfg(). The easy way to reproduce the bug is running org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing: {code} property namehbase.cluster.distributed/name value/value /property {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5633) NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237687#comment-13237687 ] stack commented on HBASE-5633: -- Thanks for the patch Matteo NPE reading ZK config in HBase -- Key: HBASE-5633 URL: https://issues.apache.org/jira/browse/HBASE-5633 Project: HBase Issue Type: Bug Components: zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5633-v1.patch, HBASE-5633-v2.patch If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and cluster.distributed property (in hbase-site.xml) is empty we get an NPE in parseZooCfg(). The easy way to reproduce the bug is running org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing: {code} property namehbase.cluster.distributed/name value/value /property {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5633) NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237688#comment-13237688 ] Lars Hofhansl commented on HBASE-5633: -- +1 NPE reading ZK config in HBase -- Key: HBASE-5633 URL: https://issues.apache.org/jira/browse/HBASE-5633 Project: HBase Issue Type: Bug Components: zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5633-v1.patch, HBASE-5633-v2.patch If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and cluster.distributed property (in hbase-site.xml) is empty we get an NPE in parseZooCfg(). The easy way to reproduce the bug is running org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing: {code} property namehbase.cluster.distributed/name value/value /property {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file
[ https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237694#comment-13237694 ] Lars Hofhansl commented on HBASE-5598: -- +1 on no-new-findbugs-policy once it's down to 0. (We do the same at Salesforce.) Not sure, though, that right method to get this to 0 is to use an exclude filter, should use findbug annotation in the files. Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file -- Key: HBASE-5598 URL: https://issues.apache.org/jira/browse/HBASE-5598 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.92.1, 0.94.0, 0.96.0 Reporter: Uma Maheswara Rao G Priority: Minor There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to up the OK count. This may lead to other issues when we re-factor the code, if we induce new valid ones and remove invalid bugs also can not be reported by QA. So, I would propose to add the exclude filter file for findbugs(for the invalid bugs). If we find any valid ones, we can fix under this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237697#comment-13237697 ] stack commented on HBASE-4393: -- I wrote a note to suggest this NOT be added to 0.94 because its only basic and its apart from hbase so we shouldn't have to hold up hbase to get it in. Was also going to talk about this tool being too basic -- emissions are on stdout only rather than up in jmx, formatted as json or whatever -- but then I thought we have to start somewhere. We can add to this basic tool later. The class needs a license and a class comment. Should be called Canary rather than HBaseCanary. Put it into a package. Would suggest we start a tool package so o.a.h.h.tool. Should implement Tool and be run using ToolRunner. Tool adds a little useful util. Needs usage as per lars. Could be added to bin/hbase as 'canary' -- could start/stop it like we start/stop region. If you do this, then things like log name and location will be set up for you as it is for rest server and thrift server etc. Should output be via LOG rather than stdout? Then we can hook its output up variously. Skip formatting in the output...the ' - Region ..' i.e. remove the ' - ' prefix. I think make the few small changes above and we'd have a good start. Thanks lads. Good stuff Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Amandeep Khurana Fix For: 0.94.0 Attachments: HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4393: - Fix Version/s: (was: 0.94.0) You are more iron-handy than me, stack. Your points are well taken, unscheduling. Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Amandeep Khurana Attachments: HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4393) Implement a canary monitoring program
[ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237699#comment-13237699 ] stack commented on HBASE-4393: -- Hmm... this thing comes up reports and goes down immediately so some of my suggestions above may be OTT. So, I don't think we need the following to get the script in (we can add it later): Could be added to bin/hbase as 'canary' – could start/stop it like we start/stop region. If you do this, then things like log name and location will be set up for you as it is for rest server and thrift server etc. Implement a canary monitoring program - Key: HBASE-4393 URL: https://issues.apache.org/jira/browse/HBASE-4393 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Amandeep Khurana Attachments: HBaseCanary.java This JIRA is to implement a standalone program that can be used to do canary monitoring of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4957) Clean up some log messages, code in RecoverableZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237700#comment-13237700 ] Hadoop QA commented on HBASE-4957: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519579/hbase-4957.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1299//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1299//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1299//console This message is automatically generated. Clean up some log messages, code in RecoverableZooKeeper Key: HBASE-4957 URL: https://issues.apache.org/jira/browse/HBASE-4957 Project: HBase Issue Type: Improvement Components: zookeeper Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.94.0 Attachments: hbase-4957.txt, hbase-4957.txt, hbase-4957.txt, hbase-4957.txt In RecoverableZooKeeper, there are a number of log messages and comments which don't really read correctly, and some other pieces of code that can be cleaned up. Simple cleanup - shouldn't be any actual behavioral changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5633) NPE reading ZK config in HBase
[ https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237701#comment-13237701 ] Hudson commented on HBASE-5633: --- Integrated in HBase-0.94 #53 (See [https://builds.apache.org/job/HBase-0.94/53/]) HBASE-5633 NPE reading ZK config in HBase (Revision 1304925) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKConfig.java NPE reading ZK config in HBase -- Key: HBASE-5633 URL: https://issues.apache.org/jira/browse/HBASE-5633 Project: HBase Issue Type: Bug Components: zookeeper Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.94.0 Attachments: HBASE-5633-v1.patch, HBASE-5633-v2.patch If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and cluster.distributed property (in hbase-site.xml) is empty we get an NPE in parseZooCfg(). The easy way to reproduce the bug is running org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing: {code} property namehbase.cluster.distributed/name value/value /property {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5434) [REST] Include more metrics in cluster status request
[ https://issues.apache.org/jira/browse/HBASE-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237702#comment-13237702 ] Hudson commented on HBASE-5434: --- Integrated in HBase-0.94 #53 (See [https://builds.apache.org/job/HBase-0.94/53/]) HBASE-5434 [REST] Include more metrics in cluster status request (Mubarak Seyed) (Revision 1304918) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HServerLoad.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/rest/StorageClusterStatusResource.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/rest/model/StorageClusterStatusModel.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/rest/protobuf/generated/StorageClusterStatusMessage.java * /hbase/branches/0.94/src/main/resources/org/apache/hadoop/hbase/rest/XMLSchema.xsd * /hbase/branches/0.94/src/main/resources/org/apache/hadoop/hbase/rest/protobuf/StorageClusterStatusMessage.proto * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/rest/model/TestStorageClusterStatusModel.java [REST] Include more metrics in cluster status request - Key: HBASE-5434 URL: https://issues.apache.org/jira/browse/HBASE-5434 Project: HBase Issue Type: Improvement Components: metrics, rest Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Mubarak Seyed Priority: Minor Labels: noob Fix For: 0.94.0, 0.96.0 Attachments: HBASE-5434.trunk.v1.patch, HBASE-5434.trunk.v2.patch, HBASE-5434.trunk.v2.patch, HBASE-5434.trunk.v2.patch /status/cluster shows only {code} stores=2 storefiless=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 {code} for a region but master web-ui shows {code} stores=1, storefiles=0, storefileUncompressedSizeMB=0 storefileSizeMB=0 memstoreSizeMB=0 storefileIndexSizeMB=0 readRequestsCount=0 writeRequestsCount=0 rootIndexSizeKB=0 totalStaticIndexSizeKB=0 totalStaticBloomSizeKB=0 totalCompactingKVs=0 currentCompactedKVs=0 compactionProgressPct=NaN {code} In a write-heavy REST gateway based production environment, ops team needs to verify whether write counters are getting incremented per region (they do run /status/cluster on each REST server), we can get the same values from *rpc.metrics.put_num_ops* and *hbase.regionserver.writeRequestsCount* but some home-grown tools needs to parse the output of /status/cluster and updates the dashboard. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4957) Clean up some log messages, code in RecoverableZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237703#comment-13237703 ] Lars Hofhansl commented on HBASE-4957: -- TestSplitLogManager passed locally. Going to commit. Clean up some log messages, code in RecoverableZooKeeper Key: HBASE-4957 URL: https://issues.apache.org/jira/browse/HBASE-4957 Project: HBase Issue Type: Improvement Components: zookeeper Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.94.0 Attachments: hbase-4957.txt, hbase-4957.txt, hbase-4957.txt, hbase-4957.txt In RecoverableZooKeeper, there are a number of log messages and comments which don't really read correctly, and some other pieces of code that can be cleaned up. Simple cleanup - shouldn't be any actual behavioral changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4147) StoreFile query usage report
[ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4147: - Priority: Critical (was: Minor) Fix Version/s: 0.96.0 Upping priority and marking against 0.96.0 so it gets more consideration. StoreFile query usage report Key: HBASE-4147 URL: https://issues.apache.org/jira/browse/HBASE-4147 Project: HBase Issue Type: Improvement Reporter: Doug Meil Priority: Critical Fix For: 0.96.0 Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf Detailed information on what HBase is doing in terms of reads is hard to come by. What would be useful is to have a periodic StoreFile query report. Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files. This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile. Even this level of summary would be useful to detect a which tables CFs are being accessed the most, and including the StoreFile would provide insight into relative uncompaction (i.e., lots of StoreFiles). I think the log-output, as opposed to UI, is an important facet with this. I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data). Just getting this to log-file would be a big improvement. Will this have a non-zero performance impact? Yes. Hopefully small, but yes it will. However, flying a plane without any instrumentation isn't fun. :-) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4147) StoreFile query usage report
[ https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237708#comment-13237708 ] Lars Hofhansl commented on HBASE-4147: -- Should we output the statistics via JMX as well? StoreFile query usage report Key: HBASE-4147 URL: https://issues.apache.org/jira/browse/HBASE-4147 Project: HBase Issue Type: Improvement Reporter: Doug Meil Priority: Critical Fix For: 0.96.0 Attachments: hbase_4147_storefilereport.pdf, hbase_4147_storefilereport_2011_08_10.pdf Detailed information on what HBase is doing in terms of reads is hard to come by. What would be useful is to have a periodic StoreFile query report. Specifically, this could run on a configured interval (e.g., every 30 seconds, 60 seconds) and dump the output to the log files. This would have all StoreFiles accessed during the reporting period (and with the Path we would also know region, CF, and table), # of times the StoreFile was accessed, the size of the StoreFile, and the total time (ms) spent processing that StoreFile. Even this level of summary would be useful to detect a which tables CFs are being accessed the most, and including the StoreFile would provide insight into relative uncompaction (i.e., lots of StoreFiles). I think the log-output, as opposed to UI, is an important facet with this. I'm assuming that users will slice and dice this data on their own so I think we should skip any kind of admin view for now (i.e., new JSPs, new APIs to expose this data). Just getting this to log-file would be a big improvement. Will this have a non-zero performance impact? Yes. Hopefully small, but yes it will. However, flying a plane without any instrumentation isn't fun. :-) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5533: - Status: Open (was: Patch Available) Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.2, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, histogram_web_ui.png To debug/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5533: - Status: Patch Available (was: Open) Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.2, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, histogram_web_ui.png To debug/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4957) Clean up some log messages, code in RecoverableZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4957: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.94 and 0.96. Clean up some log messages, code in RecoverableZooKeeper Key: HBASE-4957 URL: https://issues.apache.org/jira/browse/HBASE-4957 Project: HBase Issue Type: Improvement Components: zookeeper Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.94.0 Attachments: hbase-4957.txt, hbase-4957.txt, hbase-4957.txt, hbase-4957.txt In RecoverableZooKeeper, there are a number of log messages and comments which don't really read correctly, and some other pieces of code that can be cleaned up. Simple cleanup - shouldn't be any actual behavioral changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4957) Clean up some log messages, code in RecoverableZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4957: - Fix Version/s: 0.96.0 Clean up some log messages, code in RecoverableZooKeeper Key: HBASE-4957 URL: https://issues.apache.org/jira/browse/HBASE-4957 Project: HBase Issue Type: Improvement Components: zookeeper Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: hbase-4957.txt, hbase-4957.txt, hbase-4957.txt, hbase-4957.txt In RecoverableZooKeeper, there are a number of log messages and comments which don't really read correctly, and some other pieces of code that can be cleaned up. Simple cleanup - shouldn't be any actual behavioral changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237713#comment-13237713 ] Lars Hofhansl commented on HBASE-5623: -- Any strong opinions other than the log message? This is the last 0.94.0 issue. I think my latest proposed patch fixes the problem while not impacting maintainability. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237714#comment-13237714 ] Ted Yu commented on HBASE-5623: --- Adding LOG.debug should be fine. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-4957) Clean up some log messages, code in RecoverableZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237716#comment-13237716 ] Hudson commented on HBASE-4957: --- Integrated in HBase-0.94 #54 (See [https://builds.apache.org/job/HBase-0.94/54/]) HBASE-4957 Clean up some log messages, code in RecoverableZooKeeper (Todd) (Revision 1304941) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/RetryCounter.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java Clean up some log messages, code in RecoverableZooKeeper Key: HBASE-4957 URL: https://issues.apache.org/jira/browse/HBASE-4957 Project: HBase Issue Type: Improvement Components: zookeeper Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: hbase-4957.txt, hbase-4957.txt, hbase-4957.txt, hbase-4957.txt In RecoverableZooKeeper, there are a number of log messages and comments which don't really read correctly, and some other pieces of code that can be cleaned up. Simple cleanup - shouldn't be any actual behavioral changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5604) HLog replay tool that generates HFiles for use by LoadIncrementalHFiles.
[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237726#comment-13237726 ] stack commented on HBASE-5604: -- I like the postgres link. Should point at that when we doc this tool. bq. This could an M/R (with a mapper per HLog file). You'll need a reduce (I think you deduce this yourself above but saying it for completeness). If a map-only MR job, if many WALs, say 100s, you could end up w/ the same amount of hfiles per region (if each WAL had at least one edit for this region). You'd need a reducer to coalesce by region. This tool would not apply the edits in the order in which we received them. We'd be reliant on sort order only which should be fine I think since this is what happens if they were instead inserted via the memstore anyways. bq. Hmm... Maybe this is only useful when we have a lot of logs maybe there would be no advantage here turning this in an M/R job, but maybe it should just be a standalone client...? Well, even if tens of files only, you'd want to //ize it to do the filtering, etc., so MR sounds right. Or you could hack on the distributed split code to add a 'filtering' facility... so it dropped edits that were outside of a range -- e.g. not one of the specified tables or not of a time range. The output of distributed log splitting is only replayed on region open so you'd need to figure how to get the region to load the edits (An MR job to write hfiles sounds way more straightforward relatively). HLog replay tool that generates HFiles for use by LoadIncrementalHFiles. Key: HBASE-5604 URL: https://issues.apache.org/jira/browse/HBASE-5604 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Just an idea I had. Might be useful for restore of a backup using the HLogs. This could an M/R (with a mapper per HLog file). The tool would get a timerange and a (set of) table(s). We'd pick the right HLogs based on time before the M/R job is started and then have a mapper per HLog file. The mapper would then go through the HLog, filter all WALEdits that didn't fit into the time range or are not any of the tables and then uses HFileOutputFormat to generate HFiles. Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5604) HLog replay tool that generates HFiles for use by LoadIncrementalHFiles.
[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237744#comment-13237744 ] Lars Hofhansl commented on HBASE-5604: -- I went the distributed log splitting route first a while back. Had it all working in fact. Or so I thought. Then I tried playing logs to a new table and realized that distributed log splitting only works for crash recovery (before the regions could split further) because it splits using the region name in the log. That makes sense, because otherwise each region server participating in log splitting would need to look up the current region for each encountered row otherwise (the region could have split and the row in question needs to go one of the daugthers). That is essentially what the highlevel API does anyway. Yes, definitely need a reducer. Similar to what I did for Import, I can see this working in two modes: # The mappers directly apply changes to a running HBase cluster (TableOutputformat). No reducers needed in this case. # Create HFiles via HFileOutputFormat in the reduce phase. In fact this tool would probably be very much like Import, just with a different InputFormat. HLog replay tool that generates HFiles for use by LoadIncrementalHFiles. Key: HBASE-5604 URL: https://issues.apache.org/jira/browse/HBASE-5604 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Just an idea I had. Might be useful for restore of a backup using the HLogs. This could an M/R (with a mapper per HLog file). The tool would get a timerange and a (set of) table(s). We'd pick the right HLogs based on time before the M/R job is started and then have a mapper per HLog file. The mapper would then go through the HLog, filter all WALEdits that didn't fit into the time range or are not any of the tables and then uses HFileOutputFormat to generate HFiles. Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush
[ https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237746#comment-13237746 ] Lars Hofhansl commented on HBASE-5623: -- Will wait for Enis to have a look at latest patch. Race condition when rolling the HLog and hlogFlush -- Key: HBASE-5623 URL: https://issues.apache.org/jira/browse/HBASE-5623 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Critical Fix For: 0.94.0 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch When doing a ycsb test with a large number of handlers (regionserver.handler.count=60), I get the following exceptions: {code} Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152) at $Proxy1.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689) at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214) {code} and {code} java.lang.NullPointerException at org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400) at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351) {code} It seems the root cause of the issue is that we open a new log writer and close the old one at HLog#rollWriter() holding the updateLock, but the other threads doing syncer() calls {code} logSyncerThread.hlogFlush(this.writer); {code} without holding the updateLock. LogSyncer only synchronizes against concurrent appends and flush(), but not on the passed writer, which can be closed already by rollWriter(). In this case, since SequenceFile#Writer.close() sets it's out field as null, we get the NPE. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (HBASE-5604) HLog replay tool that generates HFiles for use by LoadIncrementalHFiles.
[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237778#comment-13237778 ] stack commented on HBASE-5604: -- So, just need to write a smart mapper that takes filtering params and a WALInputFormat? HLog replay tool that generates HFiles for use by LoadIncrementalHFiles. Key: HBASE-5604 URL: https://issues.apache.org/jira/browse/HBASE-5604 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Just an idea I had. Might be useful for restore of a backup using the HLogs. This could an M/R (with a mapper per HLog file). The tool would get a timerange and a (set of) table(s). We'd pick the right HLogs based on time before the M/R job is started and then have a mapper per HLog file. The mapper would then go through the HLog, filter all WALEdits that didn't fit into the time range or are not any of the tables and then uses HFileOutputFormat to generate HFiles. Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5533: - Status: Open (was: Patch Available) Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.2, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, histogram_web_ui.png To debug/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5533) Add more metrics to HBase
[ https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5533: - Attachment: HBASE-5533-TRUNK-v6.patch Reupload to see if it triggers hadoopqa Add more metrics to HBase - Key: HBASE-5533 URL: https://issues.apache.org/jira/browse/HBASE-5533 Project: HBase Issue Type: Improvement Affects Versions: 0.92.2, 0.94.0 Reporter: Shaneal Manek Assignee: Shaneal Manek Priority: Minor Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, hbase5533-0.92-v5.patch, histogram_web_ui.png To debug/monitor production clusters, there are some more metrics I wish I had available. In particular: - Although the average FS latencies are useful, a 'histogram' of recent latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) would be more useful - Similar histograms of latencies on common operations (GET, PUT, DELETE) would be useful - Counting the number of accesses to each region to detect hotspotting - Exposing the current number of HLog files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5564) Bulkload is discarding duplicate records
[ https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237781#comment-13237781 ] stack commented on HBASE-5564: -- Patch seems reasonable. Add curlies here: {code} + if (parser.getTimestampKeyColumnIndex() != -1) +ts = parsed.getTimestamp(); {code} Convention is you can do w/o curlies if all in one line (as you do later in this file) but if not on one line, need curlies. Can you confirm that current behavior -- setting ts to System.currentTimeMillis -- is default? It seems to be ... we set System.currentTimeMillis as time to use setting up the job. A define for NO_TIMESTAMP_KEYCOLUMN_INDEX instead of using -1 directly might help for timestampKeyColumnIndex == -1? Or put this test into a method whose name makes it obvious what the test is about ... e.g. hasTimeStampColumn() Patch adds nice usage commentary explaining new facility. Looks good. Bulkload is discarding duplicate records Key: HBASE-5564 URL: https://issues.apache.org/jira/browse/HBASE-5564 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Environment: HBase 0.92 Reporter: Laxman Assignee: Laxman Labels: bulkloader Fix For: 0.96.0 Attachments: 5564.lint, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.1.patch, HBASE-5564_trunk.patch Duplicate records are getting discarded when duplicate records exists in same input file and more specifically if they exists in same split. Duplicate records are considered if the records are from diffrent different splits. Version under test: HBase 0.92 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira