[jira] [Reopened] (HBASE-19483) Add proper privilege check for rsgroup commands
[ https://issues.apache.org/jira/browse/HBASE-19483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-19483: Removal of methods from a LimitedPrivate interface is not allowed in a patch release. Please provide an addendum for branch-1.4 which restores these methods: Removed Methods 4 hbase-server-1.4.0.jar, AccessController.class package org.apache.hadoop.hbase.security.access AccessController.isAuthorizationSupported ( Configuration conf ) [static] : boolean AccessController.requireNamespacePermission ( String request, String namespace, Permission.Action... permissions ) : void AccessController.requireNamespacePermission ( String request, String namespace, TableName tableName, Map> familyMap, Permission.Action... permissions ) : void hbase-server-1.4.0.jar, VisibilityController.class package org.apache.hadoop.hbase.security.visibility VisibilityController.isAuthorizationSupported ( Configuration conf ) [static] : boolean > Add proper privilege check for rsgroup commands > --- > > Key: HBASE-19483 > URL: https://issues.apache.org/jira/browse/HBASE-19483 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Guangxu Cheng > Fix For: 1.4.1, 1.5.0, 2.0.0-beta-1 > > Attachments: 19483.master.011.patch, 19483.v11.patch, > 19483.v11.patch, HBASE-19483.addendum-1.patch, HBASE-19483.addendum.patch, > HBASE-19483.branch-1.001.patch, HBASE-19483.branch-2.001.patch, > HBASE-19483.branch-2.002.patch, HBASE-19483.branch-2.003.patch, > HBASE-19483.master.001.patch, HBASE-19483.master.002.patch, > HBASE-19483.master.003.patch, HBASE-19483.master.004.patch, > HBASE-19483.master.005.patch, HBASE-19483.master.006.patch, > HBASE-19483.master.007.patch, HBASE-19483.master.008.patch, > HBASE-19483.master.009.patch, HBASE-19483.master.010.patch, > HBASE-19483.master.011.patch, HBASE-19483.master.011.patch, > HBASE-19483.master.012.patch, HBASE-19483.master.013.patch, > HBASE-19483.master.014.patch > > > Currently list_rsgroups command can be executed by any user. > This is inconsistent with other list commands such as list_peers and > list_peer_configs. > We should add proper privilege check for list_rsgroups command. > privilege check should be added for get_table_rsgroup / get_server_rsgroup / > get_rsgroup commands. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (HBASE-19752) RSGroupBasedLoadBalancer#getMisplacedRegions() should handle the case where rs group cannot be determined
[ https://issues.apache.org/jira/browse/HBASE-19752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-19752: Reverted from branch-1.4 due to compilation failure. Please fix and reapply. > RSGroupBasedLoadBalancer#getMisplacedRegions() should handle the case where > rs group cannot be determined > - > > Key: HBASE-19752 > URL: https://issues.apache.org/jira/browse/HBASE-19752 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 1.4.1, 1.5.0, 2.0.0-beta-1 > > Attachments: 19752.v1.txt, 19752.v2.txt, 19752.v3.txt, 19752.v4.txt, > 19752.v5.txt, 19752.v6.txt, 19752.v7.branch-1.txt, 19752.v7.txt > > > Observed the following in rs group test output: > {code} > 2018-01-10 14:17:23,006 DEBUG [AssignmentThread] > rsgroup.RSGroupBasedLoadBalancer(316): Found misplaced region: > hbase:acl,,1515593841277.ecf47ecb7522d7fab40db0a237f973fd. on server: > localhost,1,1 found in group: null outside of group: UNKNOWN > {code} > Here is corresponding code: > {code} > if (assignedServer != null && > (info == null || > !info.containsServer(assignedServer.getAddress( { > RSGroupInfo otherInfo = null; > otherInfo = > rsGroupInfoManager.getRSGroupOfServer(assignedServer.getAddress()); > LOG.debug("Found misplaced region: " + > regionInfo.getRegionNameAsString() + > {code} > As you can see, both info and otherInfo were null. > In this case, the region should not be placed in misplacedRegions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19790) Fix compatibility break in 1.3.2-SNAPSHOT
Andrew Purtell created HBASE-19790: -- Summary: Fix compatibility break in 1.3.2-SNAPSHOT Key: HBASE-19790 URL: https://issues.apache.org/jira/browse/HBASE-19790 Project: HBase Issue Type: Bug Affects Versions: 1.3.2 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Blocker Fix For: 1.3.2 This change is disallowed in a patch release: {code} package org.apache.hadoop.hbase.regionserver interface Region Abstract method closeRegionOperation ( Region.Operation ) has been added to this interface. Recompilation of a client program may be terminated with the message: a client class C is not abstract and does not override abstract method closeRegionOperation ( Region.Operation ) in Region. {code} Table is a Public interface. See https://hbase.apache.org/book.html#hbase.versioning {quote} New APIs introduced in a patch version will only be added in a source compatible way [1]: i.e. code that implements public APIs will continue to compile. {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19842) Cell ACLs v2
Andrew Purtell created HBASE-19842: -- Summary: Cell ACLs v2 Key: HBASE-19842 URL: https://issues.apache.org/jira/browse/HBASE-19842 Project: HBase Issue Type: New Feature Components: security Reporter: Andrew Purtell Per cell ACLs as currently implemented (HBASE-7662) embed the serialized ACL in a tag stored with each cell. This was done for performance. This has some drawbacks, most significantly unnecessary duplication and to grant or revoke this requires a rewrite of every affected cell. We could implement them in a space efficient (and management efficient way) at the cost of some performance like so: First, allow storage of cell level ACLs in the ACL table. Rowkey could be hash of serialized ACL format. Just have to avoid using rowkeys that associate the ACL with a cf, or table, or namespace... And handle entries in the ACL tables which don't conform to today's keying strategy. Then provide the option for storing the rowkey of an entry in the ACL table in the cell ACL tag instead of the complete serialization. The advantages would be reduction of unnecessary duplication, and, like ACLs at other granularities, a GRANT or REVOKE which updates the ACL table will update access control rules for all affected cells. The disadvantage would be in order to process the reference to the ACL for each cell with an ACL reference in a tag we will need to look up the ACL in the ACL table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19858) Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1
Andrew Purtell created HBASE-19858: -- Summary: Backport HBASE-14061 (Support CF-level Storage Policy) to branch-1 Key: HBASE-19858 URL: https://issues.apache.org/jira/browse/HBASE-19858 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 Backport the following commits to branch-1: * HBASE-14061 Support CF-level Storage Policy * HBASE-14061 Support CF-level Storage Policy (addendum) * HBASE-14061 Support CF-level Storage Policy (addendum2) * HBASE-15172 Support setting storage policy in bulkload * HBASE-17538 HDFS.setStoragePolicy() logs errors on local fs * HBASE-18015 Storage class aware block placement for procedure v2 WALs * HBASE-18017 Reduce frequency of setStoragePolicy failure warnings * HBASE-19016 Coordinate storage policy property name for table schema and bulkload Fix * Default storage policy if not configured cannot be "NONE" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19466) Rare failure in TestScannerCursor
[ https://issues.apache.org/jira/browse/HBASE-19466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-19466. Resolution: Cannot Reproduce > Rare failure in TestScannerCursor > - > > Key: HBASE-19466 > URL: https://issues.apache.org/jira/browse/HBASE-19466 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Andrew Purtell >Priority: Minor > > I think we just need to increase the timeout interval to deal with occasional > slowdowns on test executors. 1998 ms is a pretty short timeout. > By the way "rpcTimetout" in the exception message is a misspelling. > [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 37.412 s <<< FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestScannerCursor > [ERROR] > testHeartbeatWithSparseFilter(org.apache.hadoop.hbase.regionserver.TestScannerCursor) > Time elapsed: 35.604 s <<< ERROR! > org.apache.hadoop.hbase.client.RetriesExhaustedException: > Failed after attempts=36, exceptions: > Thu Dec 07 22:27:16 UTC 2017, null, java.net.SocketTimeoutException: > callTimeout=4000, callDuration=4108: Call to > ip-172-31-47-35.us-west-2.compute.internal/172.31.47.35:35690 failed on local > exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=52, > waitTime=2002, rpcTimetout=1998 row '' on table 'TestScannerCursor' at > region=TestScannerCursor,,1512685598567.1d4e59215a881d6ccbd0b5b5bdec5587., > hostname=ip-172-31-47-35.us-west-2.compute.internal,35690,1512685593244, > seqNum=2 > at > org.apache.hadoop.hbase.regionserver.TestScannerCursor.testHeartbeatWithSparseFilter(TestScannerCursor.java:154) > Caused by: java.net.SocketTimeoutException: callTimeout=4000, > callDuration=4108: Call to > ip-172-31-47-35.us-west-2.compute.internal/172.31.47.35:35690 failed on local > exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=52, > waitTime=2002, rpcTimetout=1998 row '' on table 'TestScannerCursor' at > region=TestScannerCursor,,1512685598567.1d4e59215a881d6ccbd0b5b5bdec5587., > hostname=ip-172-31-47-35.us-west-2.compute.internal,35690,1512685593244, > seqNum=2 > Caused by: java.io.IOException: Call to > ip-172-31-47-35.us-west-2.compute.internal/172.31.47.35:35690 failed on local > exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=52, > waitTime=2002, rpcTimetout=1998 > Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=52, > waitTime=2002, rpcTimetout=1998 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-17883) release 1.4.0
[ https://issues.apache.org/jira/browse/HBASE-17883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-17883. Resolution: Fixed > release 1.4.0 > - > > Key: HBASE-17883 > URL: https://issues.apache.org/jira/browse/HBASE-17883 > Project: HBase > Issue Type: Task > Components: community >Affects Versions: 1.4.0 >Reporter: Sean Busbey >Assignee: Andrew Purtell >Priority: Critical > > Let's start working through doing the needful; it's been almost 3 months sine > 1.3.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19967) Add Major Compaction Tool options for off-peak / on-peak hours
Andrew Purtell created HBASE-19967: -- Summary: Add Major Compaction Tool options for off-peak / on-peak hours Key: HBASE-19967 URL: https://issues.apache.org/jira/browse/HBASE-19967 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell After HBASE-19528 an operator can disable automatic major compaction and more intelligently manage major compaction impact on cluster operations with an external tool that drives the compaction activity. This tool can be invoked at whatever schedule is desirable, and can restrict activity by table and column family, with given concurrency of regionservers compacting at a given time. Add Major Compaction Tool options for off-peak / on-peak hours. Allow for definition of "off-peak" as a time range bounded by two points on a 24 hour clock, and for two concurrency target for global compaction activity, one for the off-peak interval, the other for the remainder. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-14610) IntegrationTestRpcClient from HBASE-14535 is failing with Async RPC client
[ https://issues.apache.org/jira/browse/HBASE-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-14610. Resolution: Incomplete Fix Version/s: (was: 1.4.2) (was: 1.2.8) (was: 1.5.0) (was: 1.3.2) (was: 3.0.0) (was: 2.0.0) > IntegrationTestRpcClient from HBASE-14535 is failing with Async RPC client > -- > > Key: HBASE-14610 > URL: https://issues.apache.org/jira/browse/HBASE-14610 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Reporter: Enis Soztutar >Priority: Major > Attachments: output > > > HBASE-14535 introduces an IT to simulate a running cluster with RPC servers > and RPC clients doing requests against the servers. > It passes with the sync client, but fails with async client. Probably we need > to take a look. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20012) Backport filesystem quotas (HBASE-16961) to branch-1
Andrew Purtell created HBASE-20012: -- Summary: Backport filesystem quotas (HBASE-16961) to branch-1 Key: HBASE-20012 URL: https://issues.apache.org/jira/browse/HBASE-20012 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Fix For: 1.5.0 Filesystem quotas (HBASE-16961) is an experimental feature committed to branch-2 and up. We are thinking about chargeback and share-back models at work and this begins to look compelling. I wish this meant then we'd give HBase 2 a spin but that's unfortunately not realistic. It is very likely we will want to make use of this before we are up on HBase 2. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20018) Safe online META repair
Andrew Purtell created HBASE-20018: -- Summary: Safe online META repair Key: HBASE-20018 URL: https://issues.apache.org/jira/browse/HBASE-20018 Project: HBase Issue Type: New Feature Components: hbck Reporter: Andrew Purtell HBCK is a tank, or a giant shotgun, or choose the battlefield metaphor you feel is most appropriate. It rolls onto the field and leaves problems crushed in its wake, but if you point it in the wrong direction, it will also crush your production data too. As such it is a means of last resort to fix an ailing cluster. It is also imperative that user request traffic, writes in particular, are stopped before attempting a number of the fixes. It is unlikely the default "-repair" option is what you want - this turns on too many fixes to risk at one time. There are a large number of command line switches for individual checks and fixes which are very useful but also error prone when cobbling together a command line for a cluster fix under pressure. An operations team might hesitate to employ hbck to fix some accumulating bad state, because of the disruption use of it requires, and the risk of compounding the problem if not carefully done. That of course would be bad because the accumulating bad state will eventually have an availability impact. It should be safer to use hbck, but changing hbck also carries risk. We can leave it be as the useful (but dangerous) tool it is and focus on a subset of its functionality to make safer. There are a class of META corruptions of mild to moderate severity which could in theory be handled more safely in an online manner without requiring a suspension of user traffic. Some things hbck does are safe enough to use directly for this. Others need tweaks to do more preflight checks (like checking region states) first. Develop these as a separate tool, maybe even a new HMaster or Admin component. Look for opportunities to share code with existing hbck, via refactor into a shared library. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20027) Port assignments in site configuration are ignored
Andrew Purtell created HBASE-20027: -- Summary: Port assignments in site configuration are ignored Key: HBASE-20027 URL: https://issues.apache.org/jira/browse/HBASE-20027 Project: HBase Issue Type: Bug Affects Versions: 1.4.3 Reporter: Andrew Purtell Port assignments for master and regionserver RPC and info ports in site configuration appear to be ignored. We are not catching this in tests because there appears to be no positive test for port assignment and the only fixed information we require is the zookeeper quorum and client port. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20063) Port HBASE-19799 (Add web UI to rsgroup) to branch-1
Andrew Purtell created HBASE-20063: -- Summary: Port HBASE-19799 (Add web UI to rsgroup) to branch-1 Key: HBASE-20063 URL: https://issues.apache.org/jira/browse/HBASE-20063 Project: HBase Issue Type: Task Components: rsgroup, UI Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0, 1.4.3 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20087) Periodically attempt redeploy of regions in FAILED_OPEN state
Andrew Purtell created HBASE-20087: -- Summary: Periodically attempt redeploy of regions in FAILED_OPEN state Key: HBASE-20087 URL: https://issues.apache.org/jira/browse/HBASE-20087 Project: HBase Issue Type: Improvement Components: master, Region Assignment Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 1.5.0 Because RSGroups can cause permanent RIT with regions in FAILED_OPEN state, we added logic to the master portion of the RSGroups extention to enumerate RITs and retry assignment of regions in FAILED_OPEN state. However, this strategy can be applied generally to reduce need of operator involvement in cluster operations. Now an operator has to manually resolve FAILED_OPEN assignments but there is little risk in automatically retrying them after a while. If the reason the assignment failed has not cleared, the assignment will just fail again. Should the reason the assignment failed be resolved, then operators don't have to do more in order for the cluster to fully heal. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20088) Update copyright notices to year 2018
Andrew Purtell created HBASE-20088: -- Summary: Update copyright notices to year 2018 Key: HBASE-20088 URL: https://issues.apache.org/jira/browse/HBASE-20088 Project: HBase Issue Type: Task Reporter: Andrew Purtell NOTICE file, UIs, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20089) make_rc.sh should name SHA-512 checksum files with the extension .sha512
Andrew Purtell created HBASE-20089: -- Summary: make_rc.sh should name SHA-512 checksum files with the extension .sha512 Key: HBASE-20089 URL: https://issues.apache.org/jira/browse/HBASE-20089 Project: HBase Issue Type: Task Reporter: Andrew Purtell >From [~elserj] {quote} we need to update the checksum naming convention for SHA*. Per [1], .sha filenames should only contain SHA1, and .sha512 file names should be used for SHA512 xsum. I believe this means we just need to modify make_rc.sh to put the xsum into .sha512 instead of .sha. We do not need to distribute SHA1 xsums and, afaik, there is little cryptographic value to this. [1] http://www.apache.org/dev/release-distribution.html#sigs-and-sums {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20096) Missing version warning for exec-maven-plugin in hbase-shaded-check-invariants
Andrew Purtell created HBASE-20096: -- Summary: Missing version warning for exec-maven-plugin in hbase-shaded-check-invariants Key: HBASE-20096 URL: https://issues.apache.org/jira/browse/HBASE-20096 Project: HBase Issue Type: Bug Components: build Reporter: Andrew Purtell Fix For: 1.5.0, 1.4.3 Reported by [~dbist13]: Affects branch-1 and branch-1.4 {noformat} [WARNING] Some problems were encountered while building the effective model for org.apache.hbase:hbase-shaded-check-invariants:pom:1.5.0-SNAPSHOT [WARNING] 'build.plugins.plugin.version' for org.codehaus.mojo:exec-maven-plugin is missing. @ org.apache.hbase:hbase-shaded-check-invariants:[unknown-version], /Users/apurtell/src/hbase/hbase-shaded/hbase-shaded-check-invariants/pom.xml, line 161, column 15 {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20096) Missing version warning for exec-maven-plugin in hbase-shaded-check-invariants
[ https://issues.apache.org/jira/browse/HBASE-20096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20096. Resolution: Duplicate Fix Version/s: (was: 1.4.3) (was: 1.5.0) Dup of HBASE-20091 > Missing version warning for exec-maven-plugin in hbase-shaded-check-invariants > -- > > Key: HBASE-20096 > URL: https://issues.apache.org/jira/browse/HBASE-20096 > Project: HBase > Issue Type: Bug > Components: build >Reporter: Andrew Purtell >Priority: Minor > > Reported by [~dbist13]: > Affects branch-1 and branch-1.4 > {noformat} > [WARNING] Some problems were encountered while building the effective model > for org.apache.hbase:hbase-shaded-check-invariants:pom:1.5.0-SNAPSHOT > [WARNING] 'build.plugins.plugin.version' for > org.codehaus.mojo:exec-maven-plugin is missing. @ > org.apache.hbase:hbase-shaded-check-invariants:[unknown-version], > /Users/apurtell/src/hbase/hbase-shaded/hbase-shaded-check-invariants/pom.xml, > line 161, column 15 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20102) AssignmentManager#shutdown doesn't shut down scheduled executor
Andrew Purtell created HBASE-20102: -- Summary: AssignmentManager#shutdown doesn't shut down scheduled executor Key: HBASE-20102 URL: https://issues.apache.org/jira/browse/HBASE-20102 Project: HBase Issue Type: Bug Components: master, Region Assignment Affects Versions: 1.4.2 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0, 1.4.3 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-19989: The new tests TestZKLessMergeOnCluster and TestZKLessSplitOnCluster consistently fail for me on branch-1.3 and branch-1.4. {code} java.lang.RuntimeException: org.apache.hadoop.hbase.exceptions.DeserializationException: Missing pb magic PBUF prefix {code} > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-19989-branch-1.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19989) READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly
[ https://issues.apache.org/jira/browse/HBASE-19989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-19989. Resolution: Fixed Pushed addendum to branch-1.3, branch-1.4, and branch-1 > READY_TO_MERGE and READY_TO_SPLIT do not update region state correctly > -- > > Key: HBASE-19989 > URL: https://issues.apache.org/jira/browse/HBASE-19989 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1, 1.4.1 >Reporter: Ben Lau >Assignee: Ben Lau >Priority: Major > Fix For: 1.3.2, 1.5.0, 1.4.3 > > Attachments: HBASE-19989-ADDENDUM-branch-1.patch, > HBASE-19989-branch-1.patch > > > Region state transitions do not work correctly for READY_TO_MERGE/SPLIT. > [~thiruvel] and I noticed this is due to break statements being in the wrong > place in AssignmentManager. This allows a race condition for example in > which one of the regions being merged could be moved concurrently, resulting > in the merge transaction failing and then double assignment and/or dataloss. > This bug appears to only affect branch-1 (for example 1.3 and 1.4) and not > branch-2 as the relevant code in AM has since been rewritten. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-17448) Export metrics from RecoverableZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-17448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-17448: Assignee: Andrew Purtell (was: Chinmay Kulkarni) > Export metrics from RecoverableZooKeeper > > > Key: HBASE-17448 > URL: https://issues.apache.org/jira/browse/HBASE-17448 > Project: HBase > Issue Type: Improvement > Components: Zookeeper >Affects Versions: 1.3.1 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Labels: patch > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17448-branch-1.patch, HBASE-17448.patch, > HBASE-17448.patch > > > Consider adding instrumentation to RecoverableZooKeeper that exposes metrics > on the performance and health of the embedded ZooKeeper client: latency > histograms for each op type, number of reconnections, number of ops where a > reconnection was necessary to proceed, number of failed ops due to > CONNECTIONLOSS, number of failed ops due to SESSIONEXIPRED, number of failed > ops due to OPERATIONTIMEOUT. > RecoverableZooKeeper is a class in hbase-client so we can hook up the new > metrics to both client- and server-side metrics reporters. Probably this > metrics source should be a process singleton. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20109) Add Admin#getMasterAddress API for lightweight discovery of the active master location
Andrew Purtell created HBASE-20109: -- Summary: Add Admin#getMasterAddress API for lightweight discovery of the active master location Key: HBASE-20109 URL: https://issues.apache.org/jira/browse/HBASE-20109 Project: HBase Issue Type: Improvement Components: Client Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 2.0.0, 1.5.0 Right now the only public API available to the client to learn the server name of the active master is Admin#getClusterStatus#getMaster, returning ServerName. On a cluster of any size getClusterStatus is expensive, especially if used only to retrieve the active master name. Let's add a simple API {code} ServerName Admin#getMasterAddress() {code} for lightweight discovery of the active master location. This makes sense because, weirdly, Admin already has a method getMasterInfoPort(), returning int. Internally the client has a notion of the active master because there is a connection open to it, or one that can be reopened, or if for some reason it's not easy to make a ServerName for that state, the ServerName can be deserialized out of the znode tracking the active master location. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-17448) Export metrics from RecoverableZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-17448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-17448. Resolution: Fixed > Export metrics from RecoverableZooKeeper > > > Key: HBASE-17448 > URL: https://issues.apache.org/jira/browse/HBASE-17448 > Project: HBase > Issue Type: Improvement > Components: Zookeeper >Affects Versions: 1.3.1 >Reporter: Andrew Purtell >Assignee: Chinmay Kulkarni >Priority: Major > Labels: patch > Fix For: 1.4.2, 1.4.1, 1.4.0 > > Attachments: HBASE-17448-branch-1.patch, HBASE-17448.patch, > HBASE-17448.patch > > > Consider adding instrumentation to RecoverableZooKeeper that exposes metrics > on the performance and health of the embedded ZooKeeper client: latency > histograms for each op type, number of reconnections, number of ops where a > reconnection was necessary to proceed, number of failed ops due to > CONNECTIONLOSS, number of failed ops due to SESSIONEXIPRED, number of failed > ops due to OPERATIONTIMEOUT. > RecoverableZooKeeper is a class in hbase-client so we can hook up the new > metrics to both client- and server-side metrics reporters. Probably this > metrics source should be a process singleton. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20146) Regions are stuck while opening when WAL is disabled
[ https://issues.apache.org/jira/browse/HBASE-20146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-20146: Reopened because there is an addendum in progress and some discussion about it. Please commit the addendum asap as soon as the discussion is settled or revert the original commit. Thanks! > Regions are stuck while opening when WAL is disabled > > > Key: HBASE-20146 > URL: https://issues.apache.org/jira/browse/HBASE-20146 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 1.3.1 >Reporter: Ashish Singhi >Assignee: Ashish Singhi >Priority: Critical > Fix For: 2.0.0, 3.0.0, 1.3.2, 1.5.0, 1.2.7, 1.4.3 > > Attachments: HBASE-20146-addendum.patch, HBASE-20146.patch, > HBASE-20146.v1.patch > > > On a running cluster we had set {{hbase.regionserver.hlog.enabled}} to false, > to disable the WAL for complete cluster, after restarting HBase service, > regions are not getting opened leading to HMaster abort as Namespace table > regions are not getting assigned. > jstack for region open: > {noformat} > "RS_OPEN_PRIORITY_REGION-BLR106595:16045-1" #159 prio=5 os_prio=0 > tid=0x7fdfa4341000 nid=0x419d waiting on condition [0x7fdfa0467000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x87554448> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at org.apache.hadoop.hbase.wal.WALKey.getWriteEntry(WALKey.java:98) > at > org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeMarker(WALUtil.java:131) > at > org.apache.hadoop.hbase.regionserver.wal.WALUtil.writeRegionEventMarker(WALUtil.java:88) > at > org.apache.hadoop.hbase.regionserver.HRegion.writeRegionOpenMarker(HRegion.java:1026) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6849) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6803) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6774) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6730) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6681) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:363) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This used to work with HBase 1.0.2 version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20063) Port HBASE-19799 (Add web UI to rsgroup) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20063. Resolution: Later Assignee: (was: Andrew Purtell) Fix Version/s: (was: 1.4.3) (was: 1.5.0) There are some java 8-isms and a difficult problem with how to handle jsp pages meant for hbase-server that must be separated out into hbase-rsgroup on branch-1 and dynamically loaded. Not worth it at this time I think, so resolving as Later (probably Never) > Port HBASE-19799 (Add web UI to rsgroup) to branch-1 > > > Key: HBASE-20063 > URL: https://issues.apache.org/jira/browse/HBASE-20063 > Project: HBase > Issue Type: Task > Components: rsgroup, UI >Reporter: Andrew Purtell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20315) Document post release process steps for RM
Andrew Purtell created HBASE-20315: -- Summary: Document post release process steps for RM Key: HBASE-20315 URL: https://issues.apache.org/jira/browse/HBASE-20315 Project: HBase Issue Type: Task Components: build, documentation Reporter: Andrew Purtell We should document post release steps that RMs have to take and add it to the 'How To Release' section of the refguide. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20318) Lower "Set storagePolicy=XXX for path=YYY" INFO level logging to DEBUG
Andrew Purtell created HBASE-20318: -- Summary: Lower "Set storagePolicy=XXX for path=YYY" INFO level logging to DEBUG Key: HBASE-20318 URL: https://issues.apache.org/jira/browse/HBASE-20318 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 2.1.0, 1.5.0, 2.0.1 Set storagePolicy=XXX for path=YYY INFO level logging is too chatty, drop to DEBUG. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20318) Lower "Set storagePolicy=XXX for path=YYY" INFO level logging to DEBUG
[ https://issues.apache.org/jira/browse/HBASE-20318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20318. Resolution: Invalid Fix Version/s: (was: 2.0.1) (was: 1.5.0) (was: 2.1.0) (was: 3.0.0) Turns out this is just an issue on an internal backport. > Lower "Set storagePolicy=XXX for path=YYY" INFO level logging to DEBUG > -- > > Key: HBASE-20318 > URL: https://issues.apache.org/jira/browse/HBASE-20318 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Trivial > > Set storagePolicy=XXX for path=YYY INFO level logging is too chatty, drop to > DEBUG. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-14729) SplitLogManager does not clean files from WALs folder in case of master failover
[ https://issues.apache.org/jira/browse/HBASE-14729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-14729: > SplitLogManager does not clean files from WALs folder in case of master > failover > > > Key: HBASE-14729 > URL: https://issues.apache.org/jira/browse/HBASE-14729 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 2.0.0 >Reporter: Samir Ahmic >Assignee: Samir Ahmic >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-14729.patch > > > While i was testing master failover process on master branch (distributed > cluster setup) i notice following: > 1. List of dead regionservers was increasing every time active master was > restarted. > 2. Number of folders in /hbase/WALs folder was increasing every time active > master was restarted > Here is exception from master logs showing why this is happening: > {code} > 2015-10-30 09:41:49,238 INFO [ProcedureExecutor-3] master.SplitLogManager: > finished splitting (more than or equal to) 0 bytes in 0 log files in > [hdfs://P3cluster/hbase/WALs/hnode1,16000,1446043659224-splitting] in 21ms > 2015-10-30 09:41:49,235 WARN [ProcedureExecutor-2] master.SplitLogManager: > Returning success without actually splitting and deleting all the log files > in path hdfs://P3cluster/hbase/WALs/hnode1,16000,1446046595488-splitting: > [FileStatus{path=hdfs://P3cluster/hbase/WALs/hnode1,16000,1446046595488-splitting/hnode1%2C16000%2C1446046595488.meta.1446046691314.meta; > isDirectory=false; length=39944; replication=3; blocksize=268435456; > modification_time=1446050348104; access_time=1446046691317; owner=hbase; > group=supergroup; permission=rw-r--r--; isSymlink=false}] > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.PathIsNotEmptyDirectoryException): > `/hbase/WALs/hnode1,16000,1446046595488-splitting is non empty': Directory > is not empty > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3524) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3479) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3463) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:751) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:562) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) > at org.apache.hadoop.ipc.Client.call(Client.java:1411) > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy15.delete(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:490) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy16.delete(Unknown Source) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279) > at com.sun.proxy.$Proxy17.delete(Unknown Source) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(
[jira] [Created] (HBASE-20429) Support for mixed or write-heavy workloads on non-HDFS filesystems
Andrew Purtell created HBASE-20429: -- Summary: Support for mixed or write-heavy workloads on non-HDFS filesystems Key: HBASE-20429 URL: https://issues.apache.org/jira/browse/HBASE-20429 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell We can support reasonably well use cases on non-HDFS filesystems, like S3, where an external writer has loaded (and continues to load) HFiles via the bulk load mechanism, and then we serve out a read only workload at the HBase API. Mixed workloads or write-heavy workloads won't fare as well. In fact, data loss seems certain. It will depend in the specific filesystem, but all of the S3 backed Hadoop filesystems suffer from a couple of obvious problems, notably a lack of atomic rename. This umbrella will serve to collect some related ideas for consideration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20430) Improve store file management for non-HDFS filesystems
Andrew Purtell created HBASE-20430: -- Summary: Improve store file management for non-HDFS filesystems Key: HBASE-20430 URL: https://issues.apache.org/jira/browse/HBASE-20430 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell HBase keeps a file open for every active store file so no additional round trips to the NameNode are needed after the initial open. HDFS internally multiplexes open files, but the Hadoop S3 filesystem implementations do not, or, at least, not as well. As the bulk of data under management increases we observe the required number of concurrently open connections will rise, and expect it will eventually exhaust a limit somewhere (the client, the OS file descriptor table or open file limits, or the S3 service). Initially we can simply introduce an option to close every store file after the reader has finished, and determine the performance impact. Use cases backed by non-HDFS filesystems will already have to cope with a different read performance profile. Based on experiments with the S3 backed Hadoop filesystems, notably S3A, even with aggressively tuned options simple reads can be very slow when there are blockcache misses, 15-20 seconds observed for Get of a single small row, for example. We expect extensive use of the BucketCache to mitigate in this application already. Could be backed by offheap storage, but more likely a large number of cache files managed by the file engine on local SSD storage. If misses are already going to be super expensive, then the motivation to do more than simply open store files on demand is largely absent. Still, we could employ a predictive cache. Where frequent access to a given store file (or, at least, its store) is predicted, keep a reference to the store file open. Can keep statistics about read frequency, write it out to HFiles during compaction, and note these stats when opening the region, perhaps by reading all meta blocks of region HFiles when opening. Otherwise, close the file after reading and open again on demand. Need to be careful not to use ARC or equivalent as cache replacement strategy as it is encumbered. The size of the cache can be determined at startup after detecting the underlying filesystem. Eg. setCacheSize(VERY_LARGE_CONSTANT) if (fs instanceof DistributedFileSystem), so we don't lose much when on HDFS still. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename
Andrew Purtell created HBASE-20431: -- Summary: Store commit transaction for filesystems that do not support an atomic rename Key: HBASE-20431 URL: https://issues.apache.org/jira/browse/HBASE-20431 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell HBase expects the Hadoop filesystem implementation to support an atomic rename() operation. HDFS does. The S3 backed filesystems do not. The fundamental issue is the non-atomic and eventually consistent nature of the S3 service. A S3 bucket is not a filesystem. S3 is not always immediately read-your-writes. Object metadata can be temporarily inconsistent just after new objects are stored. There can be a settling period to ride over. Renaming/moving objects from one path to another are copy operations with O(file) complexity and O(data) time followed by a series of deletes with O(file) complexity. Failures at any point prior to completion will leave the operation in an inconsistent state. The missing atomic rename semantic opens opportunities for corruption and data loss, which may or may not be repairable with HBCK. Handling this at the HBase level could be done with a new multi-step filesystem transaction framework. Call it StoreCommitTransaction. SplitTransaction and MergeTransaction are well established cases where even on HDFS we have non-atomic filesystem changes and are our implementation template for the new work. In this new StoreCommitTransaction we'd be moving flush and compaction temporaries out of the temporary directory into the region store directory. On HDFS the implementation would be easy. We can rely on the filesystem's atomic rename semantics. On S3 it would be work: First we would build the list of objects to move, then copy each object into the destination, and then finally delete all objects at the original path. We must handle transient errors with retry strategies appropriate for the action at hand. We must handle serious or permanent errors where the RS doesn't need to be aborted with a rollback that cleans it all up. Finally, we must handle permanent errors where the RS must be aborted with a rollback during region open/recovery. Note that after all objects have been copied and we are deleting obsolete source objects we must roll forward, not back. To support recovery after an abort we must utilize the WAL to track transaction progress. Put markers in for StoreCommitTransaction start and completion state, with details of the store file(s) involved, so it can be rolled back during region recovery at open. This will be significant work in HFile, HStore, flusher, compactor, and HRegion. Wherever we use HDFS's rename now we would substitute the running of this new multi-step filesystem transaction. We need to determine this for certain, but I believe the PUT or multipart upload of an object must complete before the object is visible, so we don't have to worry about the case where an object is visible before fully uploaded as part of normal operations. So an individual object copy will either happen entirely and the target will then become visible, or it won't and the target won't exist. S3 has an optimization, PUT COPY (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html), which the AmazonClient embedded in S3A utilizes for moves. When designing the StoreCommitTransaction be sure to allow for filesystem implementations that leverage a server side copy operation. Doing a get-then-put should be optional. (Not sure Hadoop has an interface that advertises this capability yet; we can add one if not.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20445) Defer work when a row lock is busy
Andrew Purtell created HBASE-20445: -- Summary: Defer work when a row lock is busy Key: HBASE-20445 URL: https://issues.apache.org/jira/browse/HBASE-20445 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Instead of blocking on row locks, defer the call and make the call runner available so it can service other activity. Have runners pick up deferred calls in the background after servicing the other request. Spin briefly on tryLock() wherever we are now using lock() to acquire a row lock. Introduce two new configuration parameters: one for the amount of time to wait between lock acquisition attempts, and another for the total number of times we wait before deferring the work. If the lock cannot be acquired, put the call back into the call queue. Call queues therefore should be priority queues sorted by deadline. Currently they are implemented with LinkedBlockingQueue (which isn't), or AdaptiveLifoCoDelCallQueue (which is) if the CoDel scheduler is enabled. Perhaps we could just require use of AdaptiveLifoCoDelCallQueue. Runners will be picking up work from the head of the queues as long as they are not empty, so deferred calls will be serviced again, or dropped if the deadline has passed. Implementing continuations for simple operations should be straightforward. Batch mutations try to acquire as many rowlocks as they can, then do the partial batch over the successfully locked rows, then loop back to attempt the remaining work. This is a partial implementation of what we need so we can build on it. Rather than loop around, save the partial batch completion state and put a pointer to it along with the call back into the RPC queue. For scans where allowPartialResults has been set to true we can simply complete the call at the point we fail to acquire a row lock. The client will handle the rest. For scans where allowPartialResults is false we have to save the scanner state and partial results, and put a pointer to this state along with the call back into the queue. We could approach this in phases: Phase 0 - Sort out the call queuing details. Do we require AdaptiveLifoCoDelCallQueue? Certainly we can make use of it. Can we also have RWQueueRpcExecutor create queues as PriorityBlockingQueue instead of LinkedBlockingQueue? There must be a reason why not already. Phase 1 - Implement deferral of simple ops only. (Batch mutations and scans will still block on rowlocks.) Phase 2 - Implement deferral of batch mutations. (Scans will still block on rowlocks.) Phase 3 - Implement deferral of scans where allowPartialResults is false. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20453) Shell fails to start with SyntaxError
Andrew Purtell created HBASE-20453: -- Summary: Shell fails to start with SyntaxError Key: HBASE-20453 URL: https://issues.apache.org/jira/browse/HBASE-20453 Project: HBase Issue Type: Bug Affects Versions: 1.5.0, 1.4.4 Reporter: Andrew Purtell SyntaxError: hbase/bin/../hbase-shell/src/main/ruby/hbase/table.rb:724: syntax error, unexpected tDOT .map { |i| Bytes.toStringBinary(i.getRegionInfo().getStartKey) } ^ require at org/jruby/RubyKernel.java:1062 (root) at /Users/apurtell/src/hbase/bin/../hbase-shell/src/main/ruby/hbase/table.rb:25 require at org/jruby/RubyKernel.java:1062 (root) at /Users/apurtell/src/hbase/bin/../hbase-shell/src/main/ruby/hbase/hbase.rb:102 require at org/jruby/RubyKernel.java:1062 (root) at /Users/apurtell/src/hbase/bin/../bin/hirb.rb:107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20276) [shell] Revert shell REPL change and document
[ https://issues.apache.org/jira/browse/HBASE-20276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-20276: At least in branch-1 and branch-1.4 this broke the shell, please fix SyntaxError: hbase/bin/../hbase-shell/src/main/ruby/hbase/table.rb:724: syntax error, unexpected tDOT .map { |i| Bytes.toStringBinary(i.getRegionInfo().getStartKey) } ^ require at org/jruby/RubyKernel.java:1062 (root) at /Users/apurtell/src/hbase/bin/../hbase-shell/src/main/ruby/hbase/table.rb:25 require at org/jruby/RubyKernel.java:1062 (root) at /Users/apurtell/src/hbase/bin/../hbase-shell/src/main/ruby/hbase/hbase.rb:102 require at org/jruby/RubyKernel.java:1062 (root) at /Users/apurtell/src/hbase/bin/../bin/hirb.rb:107 > [shell] Revert shell REPL change and document > - > > Key: HBASE-20276 > URL: https://issues.apache.org/jira/browse/HBASE-20276 > Project: HBase > Issue Type: Sub-task > Components: documentation, shell >Affects Versions: 1.4.0, 2.0.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Blocker > Fix For: 1.4.4, 2.0.0 > > Attachments: HBASE-20276.0.patch, HBASE-20276.1.patch, > HBASE-20276.2.patch, HBASE-20276.3.patch > > > Feedback from [~mdrob] on HBASE-19158: > {quote} > Shell: > HBASE-19770. There was another issue opened where this was identified as a > problem so maybe the shape will change further, but I can't find it now. > {quote} > New commentary from [~busbey]: > This was a follow on to HBASE-15965. That change effectively makes it so none > of our ruby wrappers can be used to build expressions in an interactive REPL. > This is a pretty severe change (most of my tips on HBASE-15611 will break, I > think). > I think we should > a) Have a DISCUSS thread, spanning dev@ and user@ > b) based on the outcome of that thread, either default to the new behavior or > the old behavior > c) if we keep the HBASE-15965 behavior as the default, flag it as > incompatible, call it out in the hbase 2.0 upgrade section, and update docs > (two examples: the output in the shell_exercises sections would be wrong, and > the _table_variables section won't work) > d) In either case document the new flag in the ref guide -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20453) Shell fails to start with SyntaxError
[ https://issues.apache.org/jira/browse/HBASE-20453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20453. Resolution: Duplicate Reopenend HBASE-20276 instead, duping this > Shell fails to start with SyntaxError > - > > Key: HBASE-20453 > URL: https://issues.apache.org/jira/browse/HBASE-20453 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Priority: Major > > SyntaxError: hbase/bin/../hbase-shell/src/main/ruby/hbase/table.rb:724: > syntax error, unexpected tDOT > .map { |i| Bytes.toStringBinary(i.getRegionInfo().getStartKey) } > ^ > require at org/jruby/RubyKernel.java:1062 >(root) at > /Users/apurtell/src/hbase/bin/../hbase-shell/src/main/ruby/hbase/table.rb:25 > require at org/jruby/RubyKernel.java:1062 >(root) at > /Users/apurtell/src/hbase/bin/../hbase-shell/src/main/ruby/hbase/hbase.rb:102 > require at org/jruby/RubyKernel.java:1062 >(root) at /Users/apurtell/src/hbase/bin/../bin/hirb.rb:107 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20486) Change default compaction throughput controller to PressureAwareThroughputController in branch-1
Andrew Purtell created HBASE-20486: -- Summary: Change default compaction throughput controller to PressureAwareThroughputController in branch-1 Key: HBASE-20486 URL: https://issues.apache.org/jira/browse/HBASE-20486 Project: HBase Issue Type: Task Reporter: Andrew Purtell Fix For: 1.5.0 Switch the default compaction throughput controller from NoLimitThroughputController to PressureAwareThroughputController in branch-1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-9465) Push entries to peer clusters serially
[ https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-9465: --- > Push entries to peer clusters serially > -- > > Key: HBASE-9465 > URL: https://issues.apache.org/jira/browse/HBASE-9465 > Project: HBase > Issue Type: New Feature > Components: regionserver, Replication >Affects Versions: 1.4.0, 2.0.0 >Reporter: Honghua Feng >Assignee: Phil Yang >Priority: Critical > Attachments: HBASE-9465-branch-1-v1.patch, > HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, > HBASE-9465-branch-1-v3.patch, HBASE-9465-branch-1-v4.patch, > HBASE-9465-branch-1-v4.patch, HBASE-9465-branch-1.v4.revert.patch, > HBASE-9465-v1.patch, HBASE-9465-v2.patch, HBASE-9465-v2.patch, > HBASE-9465-v3.patch, HBASE-9465-v4.patch, HBASE-9465-v5.patch, > HBASE-9465-v6.patch, HBASE-9465-v6.patch, HBASE-9465-v7.patch, > HBASE-9465-v7.patch, HBASE-9465.pdf > > > When region-move or RS failure occurs in master cluster, the hlog entries > that are not pushed before region-move or RS-failure will be pushed by > original RS(for region move) or another RS which takes over the remained hlog > of dead RS(for RS failure), and the new entries for the same region(s) will > be pushed by the RS which now serves the region(s), but they push the hlog > entries of a same region concurrently without coordination. > This treatment can possibly lead to data inconsistency between master and > peer clusters: > 1. there are put and then delete written to master cluster > 2. due to region-move / RS-failure, they are pushed by different > replication-source threads to peer cluster > 3. if delete is pushed to peer cluster before put, and flush and > major-compact occurs in peer cluster before put is pushed to peer cluster, > the delete is collected and the put remains in peer cluster > In this scenario, the put remains in peer cluster, but in master cluster the > put is masked by the delete, hence data inconsistency between master and peer > clusters -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20493) Port HBASE-19994 (Create a new class for RPC throttling exception, make it retryable) to branch-1
Andrew Purtell created HBASE-20493: -- Summary: Port HBASE-19994 (Create a new class for RPC throttling exception, make it retryable) to branch-1 Key: HBASE-20493 URL: https://issues.apache.org/jira/browse/HBASE-20493 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 Port HBASE-19994 (Create a new class for RPC throttling exception, make it retryable). Need to preserve the current behavior where the client gets a non-retryable ThrottlingException and only optionally throw back the retryable RpcThrottlingException if explicitly allowed by configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20496) TestGlobalThrottler failing on branch-1 since revert of HBASE-9465
Andrew Purtell created HBASE-20496: -- Summary: TestGlobalThrottler failing on branch-1 since revert of HBASE-9465 Key: HBASE-20496 URL: https://issues.apache.org/jira/browse/HBASE-20496 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Not sure why we didn't catch it earlier, but with my latest dev setup including 8u JVM, TestGlobalThrottler fails reliably, and a git bisect finds the problem at this revert: {noformat} commit ba7a936f74985eb9d974fdc87b0d06cb8cd8473d Author: Sean Busbey Date: Tue Nov 7 23:50:35 2017 -0600 Revert "HBASE-9465 Push entries to peer clusters serially" This reverts commit 441bc050b991c14c048617bc443b97f46e21b76f. Conflicts: hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java Signed-off-by: Andrew Purtell {noformat} For now I'm going to disable the test. Leaving this open for debugging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-9465) Push entries to peer clusters serially
[ https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-9465. --- Resolution: Fixed > Push entries to peer clusters serially > -- > > Key: HBASE-9465 > URL: https://issues.apache.org/jira/browse/HBASE-9465 > Project: HBase > Issue Type: New Feature > Components: regionserver, Replication >Affects Versions: 1.4.0, 2.0.0 >Reporter: Honghua Feng >Assignee: Phil Yang >Priority: Critical > Attachments: HBASE-9465-branch-1-v1.patch, > HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, > HBASE-9465-branch-1-v3.patch, HBASE-9465-branch-1-v4.patch, > HBASE-9465-branch-1-v4.patch, HBASE-9465-branch-1.v4.revert.patch, > HBASE-9465-v1.patch, HBASE-9465-v2.patch, HBASE-9465-v2.patch, > HBASE-9465-v3.patch, HBASE-9465-v4.patch, HBASE-9465-v5.patch, > HBASE-9465-v6.patch, HBASE-9465-v6.patch, HBASE-9465-v7.patch, > HBASE-9465-v7.patch, HBASE-9465.pdf > > > When region-move or RS failure occurs in master cluster, the hlog entries > that are not pushed before region-move or RS-failure will be pushed by > original RS(for region move) or another RS which takes over the remained hlog > of dead RS(for RS failure), and the new entries for the same region(s) will > be pushed by the RS which now serves the region(s), but they push the hlog > entries of a same region concurrently without coordination. > This treatment can possibly lead to data inconsistency between master and > peer clusters: > 1. there are put and then delete written to master cluster > 2. due to region-move / RS-failure, they are pushed by different > replication-source threads to peer cluster > 3. if delete is pushed to peer cluster before put, and flush and > major-compact occurs in peer cluster before put is pushed to peer cluster, > the delete is collected and the put remains in peer cluster > In this scenario, the put remains in peer cluster, but in master cluster the > put is masked by the delete, hence data inconsistency between master and peer > clusters -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20493) Port HBASE-19994 (Create a new class for RPC throttling exception, make it retryable) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-20493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20493. Resolution: Fixed Hadoop Flags: Reviewed > Port HBASE-19994 (Create a new class for RPC throttling exception, make it > retryable) to branch-1 > - > > Key: HBASE-20493 > URL: https://issues.apache.org/jira/browse/HBASE-20493 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 1.5.0 > > Attachments: HBASE-20493-branch-1.patch > > > Port HBASE-19994 (Create a new class for RPC throttling exception, make it > retryable). Need to preserve the current behavior where the client gets a > non-retryable ThrottlingException and only optionally throw back the > retryable RpcThrottlingException if explicitly allowed by configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20501) Update Hadoop minimum version to 2.7
Andrew Purtell created HBASE-20501: -- Summary: Update Hadoop minimum version to 2.7 Key: HBASE-20501 URL: https://issues.apache.org/jira/browse/HBASE-20501 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0 See discussion thread on dev@ "[DISCUSS] Branching for HBase 1.5 and Hadoop minimum version update (to 2.7)" Consensus * This is a needed change due to the practicalities of having Hadoop as a dependency * Let's move up the minimum supported version of Hadoop to 2.7.1. * Update documentation (support matrix, compatibility discussion) to call this out. * Be sure to call out this change in the release notes. * Take the opportunity to remind users about our callout "Replace the Hadoop Bundled With HBase!" recommending users upgrade their Hadoop if < 2.7.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20505) PE should support multi column family read and write cases
Andrew Purtell created HBASE-20505: -- Summary: PE should support multi column family read and write cases Key: HBASE-20505 URL: https://issues.apache.org/jira/browse/HBASE-20505 Project: HBase Issue Type: Test Reporter: Andrew Purtell Fix For: 3.0.0, 2.1.0, 1.5.0 PerformanceEvaluation has a --columns parameter but this adjusts the number of distinct column qualifiers to write (and, with --addColumns, to add to the scan), not the number of column families. We need something like a new --families parameter that will increase the number of column families defined in the test table schema, written to, and included in gets and scans. Default is 1, current behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20513) Collect and emit ScanMetrics in PerformanceEvaluation
Andrew Purtell created HBASE-20513: -- Summary: Collect and emit ScanMetrics in PerformanceEvaluation Key: HBASE-20513 URL: https://issues.apache.org/jira/browse/HBASE-20513 Project: HBase Issue Type: Improvement Components: test Reporter: Andrew Purtell Fix For: 3.0.0, 2.1.0, 1.5.0 To better understand changes in scanning behavior between version, enable ScanMetrics collection in PerformanceEvaluation and collect and roll up the results into a report at termination. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20517) Fix PerformanceEvaluation 'column' parameter
Andrew Purtell created HBASE-20517: -- Summary: Fix PerformanceEvaluation 'column' parameter Key: HBASE-20517 URL: https://issues.apache.org/jira/browse/HBASE-20517 Project: HBase Issue Type: Bug Components: test Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 2.1.0, 1.5.0, 1.2.7, 1.3.3, 2.0.1, 1.4.5 PerformanceEvaluation's 'column' parameter looks broken to me. To test: 1. Write some data with 20 columns. 2. Do a scan test selecting one column. 3. Do a scan test selecting ten columns. You'd expect the amount of data returned to vary but no, because the read side isn't selecting the same qualifiers that are written. Bytes returned in case 3 should be 10x those in case 2. I'm in branch-1 code at the moment. Probably affects trunk too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20517) Fix PerformanceEvaluation 'column' parameter
[ https://issues.apache.org/jira/browse/HBASE-20517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20517. Resolution: Fixed Pushed to 1.2 and up > Fix PerformanceEvaluation 'column' parameter > > > Key: HBASE-20517 > URL: https://issues.apache.org/jira/browse/HBASE-20517 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.2.7, 1.3.3, 2.0.1, 1.4.5 > > Attachments: HBASE-20517-branch-1.patch, HBASE-20517.patch > > > PerformanceEvaluation's 'column' parameter looks broken to me. > To test: > 1. Write some data with 20 columns. > 2. Do a scan test selecting one column. > 3. Do a scan test selecting ten columns. > You'd expect the amount of data returned to vary but no, because the read > side isn't selecting the same qualifiers that are written. Bytes returned in > case 3 should be 10x those in case 2. > I'm in branch-1 code at the moment. Probably affects trunk too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20513) Collect and emit ScanMetrics in PerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-20513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20513. Resolution: Fixed Pushed to 1.3 and up > Collect and emit ScanMetrics in PerformanceEvaluation > - > > Key: HBASE-20513 > URL: https://issues.apache.org/jira/browse/HBASE-20513 > Project: HBase > Issue Type: Test > Components: test >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 2.0.1, 1.4.5 > > Attachments: HBASE-20513-branch-1.patch, HBASE-20513.patch > > > To better understand changes in scanning behavior between version, enable > ScanMetrics collection in PerformanceEvaluation and collect and roll up the > results into a report at termination. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20505) PE should support multi column family read and write cases
[ https://issues.apache.org/jira/browse/HBASE-20505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20505. Resolution: Fixed Pushed to 1.2 and up > PE should support multi column family read and write cases > -- > > Key: HBASE-20505 > URL: https://issues.apache.org/jira/browse/HBASE-20505 > Project: HBase > Issue Type: Test >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 2.0.1, 1.4.5 > > Attachments: HBASE-20505-branch-1.patch, HBASE-20505.patch > > > PerformanceEvaluation has a --columns parameter but this adjusts the number > of distinct column qualifiers to write (and, with --addColumns, to add to the > scan), not the number of column families. > We need something like a new --families parameter that will increase the > number of column families defined in the test table schema, written to, and > included in gets and scans. Default is 1, current behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20554) "WALs outstanding" message from CleanerChore is noisy
Andrew Purtell created HBASE-20554: -- Summary: "WALs outstanding" message from CleanerChore is noisy Key: HBASE-20554 URL: https://issues.apache.org/jira/browse/HBASE-20554 Project: HBase Issue Type: Bug Reporter: Andrew Purtell WARN level "WALs outstanding" from CleanerChore should be DEBUG and are not always correct. I left a cluster configured for ITBLL (retaining all WALs for post hoc analysis) and in the morning found the master log full of "WALs outstanding" warnings from CleanerChore. Should this really be a warning? Perhaps better logged at DEBUG level. {quote} 2018-05-09 16:42:03,893 WARN [node-1.cluster,16000,1525851521469_ChoreService_2] cleaner.CleanerChore: WALs outstanding under hdfs://node-1.cluster/hbase/oldWALs If someone has configured really long WAL retention then having WALs in oldWALs will be normal. Also, it seems the warning is sometimes incorrect. {quote} 2018-05-09 16:42:24,751 WARN [node-1.cluster,16000,1525851521469_ChoreService_1] cleaner.CleanerChore: WALs outstanding under hdfs://node-1.cluster/hbase/archive {quote} There are no WALs under archive/. Even at DEBUG level, if it is not correct, then it can lead an operator to be concerned about nothing, so better to just remove it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20595) Remove the concept of 'special tables' from rsgroups
Andrew Purtell created HBASE-20595: -- Summary: Remove the concept of 'special tables' from rsgroups Key: HBASE-20595 URL: https://issues.apache.org/jira/browse/HBASE-20595 Project: HBase Issue Type: Task Components: Region Assignment, rsgroup Reporter: Andrew Purtell Fix For: 3.0.0, 2.1.0, 1.5.0 Regionserver groups needs to specially handle what it calls "special tables", tables upon which core or other modular functionality depends. They need to be excluded from normal rsgroup processing during bootstrap to avoid circular dependencies or errors due to insufficiently initialized state. I think we also want to ensure that such tables are always given a rsgroup assignment with nonzero servers. (IIRC another issue already raises that point, we can link it later.) Special tables include: * The system tables in the 'hbase:' namespace * The ACL table if the AccessController coprocessor is installed * The Labels table if the VisibilityController coprocessor is installed * The Quotas table if the FS quotas feature is active Either we need a facility where "special tables" can be registered, which should be in core. Or, we institute a blanket rule that core and all extensions that need a "special table" must put them into the 'hbase:' namespace, so the TableName#isSystemTable() test will return TRUE for all, and then rsgroups simply needs to test for that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20597) Use a lock to serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint
Andrew Purtell created HBASE-20597: -- Summary: Use a lock to serialize access to a shared reference to ZooKeeperWatcher in HBaseReplicationEndpoint Key: HBASE-20597 URL: https://issues.apache.org/jira/browse/HBASE-20597 Project: HBase Issue Type: Bug Affects Versions: 1.4.4, 1.3.2 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 2.0.1, 1.4.5 The code that closes down a ZKW that fails to initialize when attempting to connect to the remote cluster is not MT safe and can in theory leak ZooKeeperWatcher instances. The allocation of a new ZKW and store to the reference is not atomic. Might have concurrent allocations with only one winning store, leading to leaked ZKW instances. If the connection problem is persistent, like loss of shared trust between the clusters, we may accumulate unclosed ZKW instances over time, with a ZK send thread and event thread each, and eventually have enough leaked threads to cause OOME (cannot allocate native thread). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20603) Histogram metrics should reset min and max in snapshotAndReset
Andrew Purtell created HBASE-20603: -- Summary: Histogram metrics should reset min and max in snapshotAndReset Key: HBASE-20603 URL: https://issues.apache.org/jira/browse/HBASE-20603 Project: HBase Issue Type: Bug Components: metrics Reporter: Andrew Purtell Assignee: Andrew Purtell It's weird that the bins are reset at every monitoring interval but min and max are tracked over the lifetime of the process. Makes it impossible to set alarms on max value as they'll never shut off unless the process is restarted. Histogram metrics should reset min and max in snapshotAndReset. For discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20619) TestWeakObjectPool occasionally times out
Andrew Purtell created HBASE-20619: -- Summary: TestWeakObjectPool occasionally times out Key: HBASE-20619 URL: https://issues.apache.org/jira/browse/HBASE-20619 Project: HBase Issue Type: Test Components: test Affects Versions: 1.4.4, 1.5.0 Reporter: Andrew Purtell TestWeakObjectPool occasionally times out. Failure is rare and executor is an EC2 instance, so I think it's just a question of the timeout being too small. [ERROR] testCongestion(org.apache.hadoop.hbase.util.TestWeakObjectPool) Time elapsed: 1.049 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 1000 milliseconds at org.apache.hadoop.hbase.util.TestWeakObjectPool.testCongestion(TestWeakObjectPool.java:102) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20486) Change default throughput controller to PressureAwareThroughputController in branch-1
[ https://issues.apache.org/jira/browse/HBASE-20486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20486. Resolution: Fixed Hadoop Flags: Reviewed Pushed to branch-1. Thanks for the patch [~xucang] > Change default throughput controller to PressureAwareThroughputController in > branch-1 > - > > Key: HBASE-20486 > URL: https://issues.apache.org/jira/browse/HBASE-20486 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Major > Fix For: 1.5.0 > > Attachments: HBASE-20486.branch-1.001.patch > > > Switch the default throughput controller from NoLimitThroughputController to > PressureAwareThroughputController in branch-1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20608) Remove build option of error prone profile for branch-1 after HBASE-12350
[ https://issues.apache.org/jira/browse/HBASE-20608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20608. Resolution: Fixed Assignee: Andrew Purtell (was: Mike Drob) Fix Version/s: 1.5.0 Committed my hack. We can open another issue for a more nuanced fix > Remove build option of error prone profile for branch-1 after HBASE-12350 > - > > Key: HBASE-20608 > URL: https://issues.apache.org/jira/browse/HBASE-20608 > Project: HBase > Issue Type: Task > Components: build >Affects Versions: 1.4.4, 1.4.5 >Reporter: Tak Lon (Stephen) Wu >Assignee: Andrew Purtell >Priority: Major > Fix For: 1.5.0 > > > After HBASE-12350, error prone profile was introduced/backported to branch-1 > and branch-2. However, branch-1 is still building with JDK 7 and is > incompatible with this error prone profile such that `mvn test-compile` > failed since then. > Open this issue to track the removal of `-PerrorProne` in the build command > (in Jenkins) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20606) hbase:acl table is listed in list_rsgroups output even when acl is not enabled
[ https://issues.apache.org/jira/browse/HBASE-20606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20606. Resolution: Duplicate > hbase:acl table is listed in list_rsgroups output even when acl is not enabled > -- > > Key: HBASE-20606 > URL: https://issues.apache.org/jira/browse/HBASE-20606 > Project: HBase > Issue Type: Bug > Components: master >Reporter: Biju Nair >Priority: Major > > Steps to reproduce > - > {noformat} > add_rsgroup 'test_rsgroup'{noformat} > - Add a server to the new {{rsgroup}} > - > {noformat} > hbase(main):002:0> list_rsgroups > NAME SERVER / TABLE > > > test_rsgroup > > > default server dob2-bach-r3n13:16020 > > server dob2-bach-r3n13:16022 > > server dob2-bach-r3n13:16023 > > server dob2-bach-r3n13:16024 > > server dob2-bach-r3n13:16025 > > server dob2-bach-r3n13:16026 > > table hbase:meta > > > table hbase:acl > > > table hbase:namespace > > > table hbase:rsgroup > move_servers_rsgroup 'test_rsgroup',['dob2-bach-r3n13:16020']{noformat} > - Move {{hbase}} namespace to the new {{rsgroup}} > - > {noformat} > hbase(main):005:0> move_namespaces_rsgroup 'test_rsgroup',['hbase']{noformat} > - List {{Rsgroups}} to verify all the {{hbase}} tables are moved > - > {noformat} > hbase(main):006:0> list_rsgroups > NAME SERVER / TABLE > > > test_rsgroup server dob2-bach-r3n13:16020 > > table hbase:meta > > > table hbase:namespace > > > table hbase:rsgroup > > > default server dob2-bach-r3n13:16022 > > server dob2-bach-r3n13:16023 > > server dob2-bach-r3n13:16024 > > server dob2-bach-r3n13:16025 > > server dob2-bach-r3n13:16026 > > table hbase:acl {noformat} > - {{hbase:acl}} table is not moved to the new {{rsgroup}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20646) TestWALProcedureStoreOnHDFS failing on branch-1
Andrew Purtell created HBASE-20646: -- Summary: TestWALProcedureStoreOnHDFS failing on branch-1 Key: HBASE-20646 URL: https://issues.apache.org/jira/browse/HBASE-20646 Project: HBase Issue Type: Test Affects Versions: 1.4.4 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0, 1.4.5 TestWALProcedureStoreOnHDFS fails sometimes on branch-1 depending on junit particulars. An @After decoration was improperly added. Remove to fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20646) TestWALProcedureStoreOnHDFS failing on branch-1
[ https://issues.apache.org/jira/browse/HBASE-20646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20646. Resolution: Fixed > TestWALProcedureStoreOnHDFS failing on branch-1 > --- > > Key: HBASE-20646 > URL: https://issues.apache.org/jira/browse/HBASE-20646 > Project: HBase > Issue Type: Test >Affects Versions: 1.4.4 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Trivial > Fix For: 1.5.0, 1.4.5 > > Attachments: HBASE-20646-branch-1.patch > > > TestWALProcedureStoreOnHDFS fails sometimes on branch-1 depending on junit > particulars. An @After decoration was improperly added. Remove to fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20646) TestWALProcedureStoreOnHDFS failing on branch-1
[ https://issues.apache.org/jira/browse/HBASE-20646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20646. Resolution: Fixed Fix Version/s: 2.1.0 3.0.0 Committed an addendum to branch-1.4 and branch-1 that suppresses the warning. Synced this change to branch-2 and master since the issue is there too even if we are not tripping over it today. > TestWALProcedureStoreOnHDFS failing on branch-1 > --- > > Key: HBASE-20646 > URL: https://issues.apache.org/jira/browse/HBASE-20646 > Project: HBase > Issue Type: Test >Affects Versions: 1.4.4 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Trivial > Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.5 > > Attachments: HBASE-20646-branch-1.patch > > > TestWALProcedureStoreOnHDFS fails sometimes on branch-1 depending on junit > particulars. An @After decoration was improperly added. Remove to fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-18116) Replication source in-memory accounting should not include bulk transfer hfiles
[ https://issues.apache.org/jira/browse/HBASE-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-18116: I ran all TestReplication\*\* tests before commit, but forgot about TestGlobalThrottler (should really be renamed to TestReplicationGlobalThrottler). Reverted my commits for now. Can reapply once all tests are passing. {noformat} [ERROR] Failures: [ERROR] TestGlobalThrottler.testQuota:180{noformat} > Replication source in-memory accounting should not include bulk transfer > hfiles > --- > > Key: HBASE-18116 > URL: https://issues.apache.org/jira/browse/HBASE-18116 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0 > > Attachments: HBASE-18116.master.001.patch, > HBASE-18116.master.002.patch > > > In ReplicationSourceWALReaderThread we maintain a global quota on enqueued > replication work for preventing OOM by queuing up too many edits into queues > on heap. When calculating the size of a given replication queue entry, if it > has associated hfiles (is a bulk load to be replicated as a batch of hfiles), > we get the file sizes and include the sum. We then apply that result to the > quota. This isn't quite right. Those hfiles will be pulled by the sink as a > file copy, not pushed by the source. The cells in those files are not queued > in memory at the source and therefore shouldn't be counted against the quota. > Related, the sum of the hfile sizes are also included when checking if queued > work exceeds the configured replication queue capacity, which is by default > 64 MB. HFiles are commonly much larger than this. > So what happens is when we encounter a bulk load replication entry typically > both the quota and capacity limits are exceeded, we break out of loops, and > send right away. What is transferred on the wire via HBase RPC though has > only a partial relationship to the calculation. > Depending how you look at it, it makes sense to factor hfile file sizes > against replication queue capacity limits. The sink will be occupied > transferring those files at the HDFS level. Anyway, this is how we have been > doing it and it is too late to change now. I do not however think it is > correct to apply hfile file sizes against a quota for in memory state on the > source. The source doesn't queue or even transfer those bytes. > Something I noticed while working on HBASE-18027. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20667) Rename TestGlobalThrottler to TestReplicationGlobalThrottler
Andrew Purtell created HBASE-20667: -- Summary: Rename TestGlobalThrottler to TestReplicationGlobalThrottler Key: HBASE-20667 URL: https://issues.apache.org/jira/browse/HBASE-20667 Project: HBase Issue Type: Test Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 2.1.0, 1.5.0 If running replication unit tests , perhaps like {{mvn test -Dtest=TestReplication\*,Test\*Replication\*}} , then you will miss TestGlobalThrottler. This should be renamed to TestReplicationGlobalThrottler. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20670) NPE in HMaster#isInMaintenanceMode
Andrew Purtell created HBASE-20670: -- Summary: NPE in HMaster#isInMaintenanceMode Key: HBASE-20670 URL: https://issues.apache.org/jira/browse/HBASE-20670 Project: HBase Issue Type: Bug Affects Versions: 1.3.2 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0, 1.3.3, 1.4.5 {noformat} Problem accessing /master-status. Reason: INTERNAL_SERVER_ERROR Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.master.HMaster.isInMaintenanceMode(HMaster.java:2559) {noformat} The ZK trackers, including the maintenance mode tracker, are initialized only after we try to bring up the filesystem. If HDFS is in safe mode then an access to the master status page trips over this problem. There might be other issues after we fix this, but NPE Is always a bug, so let's address it. One option is to connect the ZK based components with ZK before attempting to bring up the filesystem. Let me try that first. If that doesn't work we could at least throw an IOE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20496) TestGlobalThrottler failing on branch-1 since revert of HBASE-9465
[ https://issues.apache.org/jira/browse/HBASE-20496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20496. Resolution: Fixed Resolved as part of HBASE-18116 > TestGlobalThrottler failing on branch-1 since revert of HBASE-9465 > -- > > Key: HBASE-20496 > URL: https://issues.apache.org/jira/browse/HBASE-20496 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Priority: Minor > > Not sure why we didn't catch it earlier, but with my latest dev setup > including 8u JVM, TestGlobalThrottler fails reliably, and a git bisect finds > the problem at this revert: > {noformat} > commit ba7a936f74985eb9d974fdc87b0d06cb8cd8473d > Author: Sean Busbey > Date: Tue Nov 7 23:50:35 2017 -0600 > Revert "HBASE-9465 Push entries to peer clusters serially" > This reverts commit 441bc050b991c14c048617bc443b97f46e21b76f. > Conflicts: > hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java > hbase-client/src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java > hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java > hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java > hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java > Signed-off-by: Andrew Purtell > {noformat} > For now I'm going to disable the test. Leaving this open for debugging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-18116) Replication source in-memory accounting should not include bulk transfer hfiles
[ https://issues.apache.org/jira/browse/HBASE-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-18116. Resolution: Fixed > Replication source in-memory accounting should not include bulk transfer > hfiles > --- > > Key: HBASE-18116 > URL: https://issues.apache.org/jira/browse/HBASE-18116 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Andrew Purtell >Assignee: Xu Cang >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0 > > Attachments: HBASE-18116.master.001.patch, > HBASE-18116.master.002.patch, HBASE-18116.master.003.patch > > > In ReplicationSourceWALReaderThread we maintain a global quota on enqueued > replication work for preventing OOM by queuing up too many edits into queues > on heap. When calculating the size of a given replication queue entry, if it > has associated hfiles (is a bulk load to be replicated as a batch of hfiles), > we get the file sizes and include the sum. We then apply that result to the > quota. This isn't quite right. Those hfiles will be pulled by the sink as a > file copy, not pushed by the source. The cells in those files are not queued > in memory at the source and therefore shouldn't be counted against the quota. > Related, the sum of the hfile sizes are also included when checking if queued > work exceeds the configured replication queue capacity, which is by default > 64 MB. HFiles are commonly much larger than this. > So what happens is when we encounter a bulk load replication entry typically > both the quota and capacity limits are exceeded, we break out of loops, and > send right away. What is transferred on the wire via HBase RPC though has > only a partial relationship to the calculation. > Depending how you look at it, it makes sense to factor hfile file sizes > against replication queue capacity limits. The sink will be occupied > transferring those files at the HDFS level. Anyway, this is how we have been > doing it and it is too late to change now. I do not however think it is > correct to apply hfile file sizes against a quota for in memory state on the > source. The source doesn't queue or even transfer those bytes. > Something I noticed while working on HBASE-18027. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20799) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
Andrew Purtell created HBASE-20799: -- Summary: TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky Key: HBASE-20799 URL: https://issues.apache.org/jira/browse/HBASE-20799 Project: HBase Issue Type: Bug Affects Versions: 1.5.0 Reporter: Andrew Purtell {noformat} [ERROR] testCacheBlockNextBlockMetadataMissing[1: blockSize=16,384, bucketSizes=[I@29ee9faa](org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache) Time elapsed: 0.066 s <<< FAILURE! java.lang.AssertionError: expected: java.nio.HeapByteBuffer but was: java.nio.HeapByteBuffer at org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache.testCacheBlockNextBlockMetadataMissing(TestBucketCache.java:424) {noformat} [~zyork] any idea what is going on here? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20799) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
[ https://issues.apache.org/jira/browse/HBASE-20799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20799. Resolution: Duplicate > TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky > --- > > Key: HBASE-20799 > URL: https://issues.apache.org/jira/browse/HBASE-20799 > Project: HBase > Issue Type: Bug >Affects Versions: 1.5.0, 1.4.5 >Reporter: Andrew Purtell >Priority: Major > > {noformat} > [ERROR] testCacheBlockNextBlockMetadataMissing[1: blockSize=16,384, > bucketSizes=[I@29ee9faa](org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache) > Time elapsed: 0.066 s <<< FAILURE! > java.lang.AssertionError: expected: > java.nio.HeapByteBuffer but > was: java.nio.HeapByteBuffer > at > org.apache.hadoop.hbase.io.hfile.bucket.TestBucketCache.testCacheBlockNextBlockMetadataMissing(TestBucketCache.java:424) > {noformat} > [~zyork] any idea what is going on here? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20450) Provide metrics for number of total active, priority and replication rpc handlers
[ https://issues.apache.org/jira/browse/HBASE-20450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-20450: This was only committed to trunk but I think it would be useful to bring back to branch-1 (for 1.5). I have a branch-1 patch ready so no need for anyone else to backport. Curious if you'd mind this in branch-2 [~stack] (not branch-2.0!) Going to assume ok if I don't hear anything for a few days. > Provide metrics for number of total active, priority and replication rpc > handlers > - > > Key: HBASE-20450 > URL: https://issues.apache.org/jira/browse/HBASE-20450 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20450.master.001.patch, > HBASE-20450.master.002.patch > > > Currently hbase provides a metric for [number of total active rpc > handlers|https://github.com/apache/hbase/blob/f4f2b68238a094d7b1931dc8b7939742ccbb2b57/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java#L187] > which is a sum of the following: > * number of active general rpc handlers > * number of active priority rpc handlers > * number of active replication rpc handlers > I think we can have 3 different metrics corresponding to the above mentioned > handlers which will allow us to see detailed information about number of > active handlers running for a particular type of handler. > We can have following new metrics: > * numActiveGeneralHandler > * numActivePriorityHandler > * numActiveReplicationHandler > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20450) Provide metrics for number of total active, priority and replication rpc handlers
[ https://issues.apache.org/jira/browse/HBASE-20450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20450. Resolution: Fixed > Provide metrics for number of total active, priority and replication rpc > handlers > - > > Key: HBASE-20450 > URL: https://issues.apache.org/jira/browse/HBASE-20450 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Fix For: 3.0.0, 2.1.0, 1.5.0 > > Attachments: HBASE-20450.master.001.patch, > HBASE-20450.master.002.patch > > > Currently hbase provides a metric for [number of total active rpc > handlers|https://github.com/apache/hbase/blob/f4f2b68238a094d7b1931dc8b7939742ccbb2b57/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java#L187] > which is a sum of the following: > * number of active general rpc handlers > * number of active priority rpc handlers > * number of active replication rpc handlers > I think we can have 3 different metrics corresponding to the above mentioned > handlers which will allow us to see detailed information about number of > active handlers running for a particular type of handler. > We can have following new metrics: > * numActiveGeneralHandler > * numActivePriorityHandler > * numActiveReplicationHandler > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20895) NPE in RpcServer#readAndProcess
Andrew Purtell created HBASE-20895: -- Summary: NPE in RpcServer#readAndProcess Key: HBASE-20895 URL: https://issues.apache.org/jira/browse/HBASE-20895 Project: HBase Issue Type: Bug Components: rpc Affects Versions: 1.3.2 Reporter: Andrew Purtell Assignee: Monani Mihir Fix For: 1.5.0, 1.3.3, 1.4.6 {noformat} 2018-07-10 16:25:55,005 DEBUG [.sfdc.net,port=60020] ipc.RpcServer - RpcServer.listener,port=60020: Caught exception while reading: java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1761) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:949) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:730) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:706) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} This looks like it could be a use after close problem if there is concurrent access to a Connection. In process() we might store a null back to the 'data' field. Meanwhile in readAndProcess() we have a case where we might be blocked on a channel read and then after coming back from the read we go to use 'data' after a null has been written back, leading to a NPE. {quote} count = channelRead(channel, data); 1761 ---> if (count >= 0 && *data.remaining()* == 0) { // count==0 if dataLength == 0 process(); } {quote} Whether a NPE happens or not is going to depend on the timing of the store back to 'data' in another thread and use of 'data' in this thread and whether or not the JVM has optimized away a reload of 'data' (it's not declared volatile) We should do a null check here just to be defensive. We should also look at whether the concurrent access to the Connection is intended. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20897) Port HBASE-20866 to branch-2 and up
Andrew Purtell created HBASE-20897: -- Summary: Port HBASE-20866 to branch-2 and up Key: HBASE-20897 URL: https://issues.apache.org/jira/browse/HBASE-20897 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Vikas Vishwakarma -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20896) Port HBASE-20866 to branch-1 and branch-1.4
Andrew Purtell created HBASE-20896: -- Summary: Port HBASE-20866 to branch-1 and branch-1.4 Key: HBASE-20896 URL: https://issues.apache.org/jira/browse/HBASE-20896 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Vikas Vishwakarma -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20931) [branch-1] Add -Dhttps.protocols=TLSv1.2 to Maven command line in make_rc.sh
Andrew Purtell created HBASE-20931: -- Summary: [branch-1] Add -Dhttps.protocols=TLSv1.2 to Maven command line in make_rc.sh Key: HBASE-20931 URL: https://issues.apache.org/jira/browse/HBASE-20931 Project: HBase Issue Type: Task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.6 As of June 2018 the insecure TLS 1.0 and 1.1 protocols are no longer supported for SSL connections to Maven Central and perhaps other public Maven repositories. The branch-1 builds which require Java 7, of which the latest public release was 7u80, need to add {{-Dhttps.protocols=TLSv1.2}} to the Maven command line in order to avoid artifact retrieval problems during builds. We especially need this in make_rc.sh which starts up with an empty local Maven cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20931) [branch-1] Add -Dhttps.protocols=TLSv1.2 to Maven command line in make_rc.sh
[ https://issues.apache.org/jira/browse/HBASE-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20931. Resolution: Fixed > [branch-1] Add -Dhttps.protocols=TLSv1.2 to Maven command line in make_rc.sh > > > Key: HBASE-20931 > URL: https://issues.apache.org/jira/browse/HBASE-20931 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Trivial > Fix For: 1.5.0, 1.2.7, 1.3.3, 1.4.6 > > Attachments: HBASE-20931-branch-1.patch > > > As of June 2018 the insecure TLS 1.0 and 1.1 protocols are no longer > supported for SSL connections to Maven Central and perhaps other public Maven > repositories. The branch-1 builds which require Java 7, of which the latest > public release was 7u80, need to add {{-Dhttps.protocols=TLSv1.2}} to the > Maven command line in order to avoid artifact retrieval problems during > builds. > We especially need this in make_rc.sh which starts up with an empty local > Maven cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20982) [branch-1] TestExportSnapshot is flaky
Andrew Purtell created HBASE-20982: -- Summary: [branch-1] TestExportSnapshot is flaky Key: HBASE-20982 URL: https://issues.apache.org/jira/browse/HBASE-20982 Project: HBase Issue Type: Bug Components: test Affects Versions: 1.4.6 Reporter: Andrew Purtell Passes for me {noformat} [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 390.02 s - in org.apache.hadoop.hbase.snapshot.TestExportSnapshot [INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0 {noformat} but fails or times out for others. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21000) Default limits for PressureAwareCompactionThroughputController are too low
Andrew Purtell created HBASE-21000: -- Summary: Default limits for PressureAwareCompactionThroughputController are too low Key: HBASE-21000 URL: https://issues.apache.org/jira/browse/HBASE-21000 Project: HBase Issue Type: Improvement Affects Versions: 1.5.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 1.5.0, 2.2.0 In PressureAwareCompactionThroughputController: {code:java} /** * A throughput controller which uses the follow schema to limit throughput * * If compaction pressure is greater than 1.0, no limitation. * In off peak hours, use a fixed throughput limitation * {@value #HBASE_HSTORE_COMPACTION_MAX_THROUGHPUT_OFFPEAK} * In normal hours, the max throughput is tuned between * {@value #HBASE_HSTORE_COMPACTION_MAX_THROUGHPUT_LOWER_BOUND} and * {@value #HBASE_HSTORE_COMPACTION_MAX_THROUGHPUT_HIGHER_BOUND}, using the formula "lower + * (higher - lower) * compactionPressure", where compactionPressure is in range [0.0, 1.0] * */ {code} The lower and upper bounds are 10MB/sec and 20MB/sec, respectively: {code:java} public static final String HBASE_HSTORE_COMPACTION_MAX_THROUGHPUT_HIGHER_BOUND = "hbase.hstore.compaction.throughput.higher.bound"; private static final long DEFAULT_HBASE_HSTORE_COMPACTION_MAX_THROUGHPUT_HIGHER_BOUND = 20L * 1024 * 1024; public static final String HBASE_HSTORE_COMPACTION_MAX_THROUGHPUT_LOWER_BOUND = "hbase.hstore.compaction.throughput.lower.bound"; private static final long DEFAULT_HBASE_HSTORE_COMPACTION_MAX_THROUGHPUT_LOWER_BOUND = 10L * 1024 * 1024; {code} (In contrast, in PressureAwareFlushThroughputController the lower and upper bounds are 10x of those limits, at 100MB/sec and 200MB/sec, respectively.) In fairly light load scenarios we see compaction quickly falls behind and write clients are backed off or failing due to RegionTooBusy exceptions. Although compaction throughput becomes unbounded after the store reaches the blocking file count, in the lead up to this the default settings do not provide enough bandwidth to stave off blocking. The defaults should be increased. I'm not sure what new good defaults make sense. We could start by doubling them to 20MB/sec and 40MB/sec respectively. Might need to be doubled again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20407) Retry HBase admin API if master failover is in progress
[ https://issues.apache.org/jira/browse/HBASE-20407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20407. Resolution: Duplicate Duping this out in favor of HBASE-20408 > Retry HBase admin API if master failover is in progress > --- > > Key: HBASE-20407 > URL: https://issues.apache.org/jira/browse/HBASE-20407 > Project: HBase > Issue Type: Improvement > Components: Admin >Reporter: Divesh Jain >Assignee: Divesh Jain >Priority: Minor > > When a master switch over is in progress and an admin API is called, perform > a retry operation before throwing an exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21099) NPE in TestTableResource.setUpBeforeClass (TestTableResource.java:134)
Andrew Purtell created HBASE-21099: -- Summary: NPE in TestTableResource.setUpBeforeClass (TestTableResource.java:134) Key: HBASE-21099 URL: https://issues.apache.org/jira/browse/HBASE-21099 Project: HBase Issue Type: Bug Components: REST, test Reporter: Andrew Purtell Fix For: 2.0.2, 2.2.0, 2.1.1 TestTableResource fails consistently with NPE, only in the branch-2s. Both master and branch-1 is fine. {noformat} [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.397 s <<< FAILURE! - in org.apache.hadoop.hbase.rest.TestTableResource [ERROR] org.apache.hadoop.hbase.rest.TestTableResource Time elapsed: 54.395 s <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.hbase.rest.TestTableResource.setUpBeforeClass(TestTableResource.java:134) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20940) HStore.cansplit should not allow split to happen if it has references
[ https://issues.apache.org/jira/browse/HBASE-20940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-20940: We didn't catch some test issues, see HBASE-21105 . Reopening. Needs an addendum > HStore.cansplit should not allow split to happen if it has references > - > > Key: HBASE-20940 > URL: https://issues.apache.org/jira/browse/HBASE-20940 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.1.1, 2.0.2, 1.4.7 > > Attachments: HBASE-20940.branch-1.3.v1.patch, > HBASE-20940.branch-1.3.v2.patch, HBASE-20940.branch-1.v1.patch, > HBASE-20940.branch-1.v2.patch, HBASE-20940.branch-1.v3.patch, > HBASE-20940.v1.patch, HBASE-20940.v2.patch, HBASE-20940.v3.patch, > HBASE-20940.v4.patch, result_HBASE-20940.branch-1.v2.log > > > When split happens and immediately another split happens, it may result into > a split of a region who still has references to its parent. More details > about scenario can be found here HBASE-20933 > HStore.hasReferences should check from fs.storefile rather than in memory > objects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20890) PE filterScan seems to be stuck forever
[ https://issues.apache.org/jira/browse/HBASE-20890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-20890: > PE filterScan seems to be stuck forever > --- > > Key: HBASE-20890 > URL: https://issues.apache.org/jira/browse/HBASE-20890 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.3 >Reporter: Vikas Vishwakarma >Assignee: Abhishek Goyal >Priority: Minor > > Command Used > {code:java} > ~/current/bigdata-hbase/hbase/hbase/bin/hbase pe --nomapred randomWrite 1 > > write 2>&1 > ~/current/bigdata-hbase/hbase/hbase/bin/hbase pe --nomapred filterScan 1 > > filterScan 2>&1 > {code} > > Output > This kept running for several hours just printing the below messages in logs > > {code:java} > -bash-4.1$ grep "Advancing internal scanner to startKey" filterScan.1 | head > 2018-07-13 10:44:45,188 DEBUG [TestClient-0] client.ClientScanner - Advancing > internal scanner to startKey at '52359' > 2018-07-13 10:44:45,976 DEBUG [TestClient-0] client.ClientScanner - Advancing > internal scanner to startKey at '52359' > 2018-07-13 10:44:46,695 DEBUG [TestClient-0] client.ClientScanner - Advancing > internal scanner to startKey at '52359' > . > -bash-4.1$ grep "Advancing internal scanner to startKey" filterScan.1 | tail > 2018-07-15 06:20:22,353 DEBUG [TestClient-0] client.ClientScanner - Advancing > internal scanner to startKey at '52359' > 2018-07-15 06:20:23,044 DEBUG [TestClient-0] client.ClientScanner - Advancing > internal scanner to startKey at '52359' > 2018-07-15 06:20:23,768 DEBUG [TestClient-0] client.ClientScanner - Advancing > internal scanner to startKey at '52359' > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21162) Revert suspicious change to BoundedByteBufferPool and disable use of direct buffers for IPC reservoir by default
Andrew Purtell created HBASE-21162: -- Summary: Revert suspicious change to BoundedByteBufferPool and disable use of direct buffers for IPC reservoir by default Key: HBASE-21162 URL: https://issues.apache.org/jira/browse/HBASE-21162 Project: HBase Issue Type: Bug Affects Versions: 1.4.7 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0, 1.4.8 We had a production incident where we traced the issue to a direct buffer leak. On a hunch we tried setting hbase.ipc.server.reservoir.enabled = false and after that no native memory leak could be observed in any regionserver process under the triggering load. On HBASE-19239 (Fix findbugs and error-prone issues) I made a change to BoundedByteBufferPool that is suspicious given this finding. It was committed to branch-1.4 and branch-1. I'm going to revert this change. In addition the allocation of direct memory for the server RPC reservoir is a bit problematic in that tracing native memory or direct buffer leaks to a particular class or compilation unit is difficult, so I also propose allocating the reservoir on the heap by default instead. Should there be a leak it is much easier to do an analysis of a heap dump with familiar tools to find it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20307) LoadTestTool prints too much zookeeper logging
[ https://issues.apache.org/jira/browse/HBASE-20307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-20307. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.0.3 2.1.1 1.4.8 2.2.0 1.2.8 1.3.3 1.5.0 3.0.0 > LoadTestTool prints too much zookeeper logging > -- > > Key: HBASE-20307 > URL: https://issues.apache.org/jira/browse/HBASE-20307 > Project: HBase > Issue Type: Bug > Components: tooling >Reporter: Mike Drob >Assignee: Colin Garcia >Priority: Major > Labels: beginner > Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > Attachments: HBASE-20307.000.patch, HBASE-20307.001.patch > > > When running ltt there is a ton of ZK related cruft that I probably don't > care about. Hide it behind -verbose flag or point people at log4j > configuration but don't print it by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21203) TestZKMainServer#testCommandLineWorks won't pass with default 4lw whitelist
Andrew Purtell created HBASE-21203: -- Summary: TestZKMainServer#testCommandLineWorks won't pass with default 4lw whitelist Key: HBASE-21203 URL: https://issues.apache.org/jira/browse/HBASE-21203 Project: HBase Issue Type: Bug Reporter: Andrew Purtell Assignee: Andrew Purtell Recent versions of ZooKeeper whitelist the so-called 4-letter word admin commands, and 'stat' is not in the default whitelist, so TestZKMainServer#testCommandLineWorks cannot get off the ground.. Set system property zookeeper.4lw.commands.whitelist=* in MiniZooKeeperCluster#setupTestEnv as we do not need to whitelist 4-letter commands for unit tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-10342) RowKey Prefix Bloom Filter
[ https://issues.apache.org/jira/browse/HBASE-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10342. Resolution: Duplicate Duped by HBASE-20636 > RowKey Prefix Bloom Filter > -- > > Key: HBASE-10342 > URL: https://issues.apache.org/jira/browse/HBASE-10342 > Project: HBase > Issue Type: New Feature >Reporter: Liyin Tang >Priority: Major > > When designing HBase schema for some use cases, it is quite common to combine > multiple information within the RowKey. For instance, assuming that rowkey is > constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys > which starting by id1. In such case, the rowkey bloom filter is able to cut > more unnecessary seeks during the scan. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21220) Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1
Andrew Purtell created HBASE-21220: -- Summary: Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1 Key: HBASE-21220 URL: https://issues.apache.org/jira/browse/HBASE-21220 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Fix For: 1.5.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19418) RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable.
[ https://issues.apache.org/jira/browse/HBASE-19418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-19418. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.1.1 1.4.8 2.2.0 1.3.3 1.5.0 3.0.0 Pushed up, thanks for the contribution [~ramatronics] > RANGE_OF_DELAY in PeriodicMemstoreFlusher should be configurable. > - > > Key: HBASE-19418 > URL: https://issues.apache.org/jira/browse/HBASE-19418 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-4 >Reporter: Jean-Marc Spaggiari >Assignee: Ramie Raufdeen >Priority: Major > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 1.4.8, 2.1.1 > > Attachments: HBASE-19418.master.000.patch > > > When RSs have a LOT of regions and CFs, flushing everything within 5 minutes > is not always doable. It might be interesting to be able to increase the > RANGE_OF_DELAY. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups
[ https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-21258: Pardon me, there has been a review error. Reopening because I'm reverting what was committed to branch-1. > Add resetting of flags for RS Group pre/post hooks in TestRSGroups > -- > > Key: HBASE-21258 > URL: https://issues.apache.org/jira/browse/HBASE-21258 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, > 21258.branch-2.v1.patch, 21258.v1.txt > > > Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS > Group pre/post hooks in TestRSGroups was absent. > This issue is to add the resetting of these flags before each subtest starts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups
[ https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21258. Resolution: Fixed Fix Version/s: 1.4.8 1.5.0 The branch-2 patch applies without any changes needed. Resolving this as fixed. If additional changes are needed, let's open a new issue not do something radical with a branch-1 patch. > Add resetting of flags for RS Group pre/post hooks in TestRSGroups > -- > > Key: HBASE-21258 > URL: https://issues.apache.org/jira/browse/HBASE-21258 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8 > > Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, > 21258.branch-2.v1.patch, 21258.v1.txt > > > Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS > Group pre/post hooks in TestRSGroups was absent. > This issue is to add the resetting of these flags before each subtest starts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21117) Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing table locking issue.)
[ https://issues.apache.org/jira/browse/HBASE-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21117. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.4.8 1.5.0 > Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing > table locking issue.) > -- > > Key: HBASE-21117 > URL: https://issues.apache.org/jira/browse/HBASE-21117 > Project: HBase > Issue Type: Bug > Components: backport, rsgroup, shell >Affects Versions: 1.3.2 >Reporter: Xu Cang >Assignee: Xu Cang >Priority: Major > Labels: backport > Fix For: 1.5.0, 1.4.8 > > Attachments: HBASE-21117-branch-1.001.patch, > HBASE-21117-branch-1.002.patch > > > When working on HBASE-20666, I found out HBASE-18350 did not get ported to > branch-1, which causes procedure to hang when #moveTables called sometimes. > After looking into the 18350 patch, seems it's important since it fixes 4 > issues. This Jira is an attempt to backport it to branch-1. > > > Edited: Aug26. > After reviewed the HBASE-18350 patch. I decided to only port part 2 of the > patch. > Because part1 and part3 is AMv2 related. I won't touch is since Amv2 is only > for branch-2 > > {quote} > Subject: [PATCH] HBASE-18350 RSGroups are broken under AMv2 > - Table moving to RSG was buggy, because it left the table unassigned. > Now it is fixed we immediately assign to an appropriate RS > (MoveRegionProcedure). > *- Table was locked while moving, but unassign operation hung, because* > *locked table queues are not scheduled while locked. Fixed. port > this one.* > - ProcedureSyncWait was buggy, because it searched the procId in > executor, but executor does not store the return values of internal > operations (they are stored, but immediately removed by the cleaner). > - list_rsgroups in the shell show also the assigned tables and servers. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21261) Add log4j.properties for hbase-rsgroup tests
[ https://issues.apache.org/jira/browse/HBASE-21261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-21261. Resolution: Fixed Fix Version/s: 2.0.3 2.1.1 1.4.8 2.2.0 1.5.0 3.0.0 > Add log4j.properties for hbase-rsgroup tests > > > Key: HBASE-21261 > URL: https://issues.apache.org/jira/browse/HBASE-21261 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Andrew Purtell >Priority: Trivial > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1, 2.0.3 > > > When I tried to debug TestRSGroups, at first I couldn't find any DEBUG log. > Turns out that under hbase-rsgroup/src/test/resources there is no > log4j.properties > This issue adds log4j.properties for hbase-rsgroup tests. > This would be useful when finding root cause for hbase-rsgroup test > failure(s). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21263) Mention compression algorithm along with other storefile details
Andrew Purtell created HBASE-21263: -- Summary: Mention compression algorithm along with other storefile details Key: HBASE-21263 URL: https://issues.apache.org/jira/browse/HBASE-21263 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Where we log storefile details we should also log the compression algorithm used to compress blocks on disk, if any. For example, here's a log line out of compaction: 2018-10-02 21:59:47,594 DEBUG [regionserver/host/1.1.1.1:8120-longCompactions-1538517461152] compactions.Compactor: Compacting hdfs://namenode:8020/hbase/data/default/TestTable/86037c19117a46b5b8148439ea55753b/tiny/3d04a7c28d6343ceb773737dbb192533, keycount=3335873, bloomtype=ROW, size=107.5 M, encoding=ROW_INDEX_V1, seqNum=154199, earliestPutTs=1538516084915 Aside from bloom type, block encoding, and filename, it would be good to know compression type in this type of DEBUG or INFO level logging. A minor omission of information that could be helpful during debugging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19444) RSGroups test units cannot be concurrently executed
[ https://issues.apache.org/jira/browse/HBASE-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-19444. Resolution: Duplicate Fix Version/s: (was: 1.4.9) (was: 2.2.0) (was: 1.5.0) (was: 3.0.0) Duping this out. Replacing with a task issue to break up TestRSGroups into smaller units. Current run time is ~240 seconds / 4 minutes and the test is only stable when run by itself. > RSGroups test units cannot be concurrently executed > --- > > Key: HBASE-19444 > URL: https://issues.apache.org/jira/browse/HBASE-19444 > Project: HBase > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Andrew Purtell >Priority: Minor > > TestRSGroups and friends cannot be concurrently executed or they are very > likely to flake, failing with constraint exceptions. If executed serially all > units pass. Fix for concurrent execution. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21265) Split up TestRSGroups
Andrew Purtell created HBASE-21265: -- Summary: Split up TestRSGroups Key: HBASE-21265 URL: https://issues.apache.org/jira/browse/HBASE-21265 Project: HBase Issue Type: Task Components: rsgroup, test Affects Versions: 1.4.8 Reporter: Andrew Purtell Fix For: 3.0.0, 1.5.0, 2.2.0 TestRSGroups is flaky. It is stable when run in isolation but when run as part of the suite with concurrent executors it can fail. The current running time of this unit on my dev box is ~240 seconds (4 minutes), which is far too much time. This unit should be broken up 5 to 8 ways, grouped by functionality under test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21266) Not running balancer because processing dead regionservers, but empty rs list, and state does not recover
Andrew Purtell created HBASE-21266: -- Summary: Not running balancer because processing dead regionservers, but empty rs list, and state does not recover Key: HBASE-21266 URL: https://issues.apache.org/jira/browse/HBASE-21266 Project: HBase Issue Type: Bug Affects Versions: 1.4.8 Reporter: Andrew Purtell Fix For: 1.5.0, 1.4.9 Found during ITBLL testing. AM in master gets into a state where manual attempts from the shell to run the balancer always return false and this is printed in the master log: 2018-10-03 19:17:14,892 DEBUG [RpcServer.default.FPBQ.Fifo.handler=21,queue=0,port=8100] master.HMaster: Not running balancer because processing dead regionserver(s): Note the empty list. This errant state did not recover without intervention by way of master restart, but the test environment was chaotic so needs investigation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21283) Add new shell command 'rit' for listing regions in transition
Andrew Purtell created HBASE-21283: -- Summary: Add new shell command 'rit' for listing regions in transition Key: HBASE-21283 URL: https://issues.apache.org/jira/browse/HBASE-21283 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 3.0.0, 1.5.0, 2.2.0 The 'status' shell command shows regions in transition but sometimes an operator may want to retrieve a simple list of regions in transition. Here's a patch that adds a new 'rit' command to the TOOLS group that does just that. No test, because it seems hard to mock RITs from the ruby test code, but I have run TestShell and it passes, so the command is verified to meet minimum requirements, like help text, and manually verified with branch-1 (shell in branch-2 and up doesn't return until TransitRegionProcedure has completed so by that time no RIT): {noformat} HBase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. Version 1.5.0-SNAPSHOT, r9bb6d2fa8b760f16cd046657240ebd4ad91cb6de, Mon Oct 8 21:05:50 UTC 2018 hbase(main):001:0> help 'rit' List all regions in transition. Examples: hbase> rit hbase(main):002:0> create ... 0 row(s) in 2.5150 seconds => Hbase::Table - IntegrationTestBigLinkedList hbase(main):003:0> rit 0 row(s) in 0.0340 seconds hbase(main):004:0> unassign '56f0c38c81ae453d19906ce156a2d6a1' 0 row(s) in 0.0540 seconds hbase(main):005:0> rit IntegrationTestBigLinkedList,L\xCC\xCC\xCC\xCC\xCC\xCC\xCB,1539117183224.56f0c38c81ae453d19906ce156a2d6a1. state=PENDING_CLOSE, ts=Tue Oct 09 20:33:34 UTC 2018 (0s ago), server=null 1 row(s) in 0.0170 seconds {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21284) Forward port HBASE-21000 to branch-2
Andrew Purtell created HBASE-21284: -- Summary: Forward port HBASE-21000 to branch-2 Key: HBASE-21284 URL: https://issues.apache.org/jira/browse/HBASE-21284 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell See parent for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21346) Update release procedure and website publishing docs in the book
Andrew Purtell created HBASE-21346: -- Summary: Update release procedure and website publishing docs in the book Key: HBASE-21346 URL: https://issues.apache.org/jira/browse/HBASE-21346 Project: HBase Issue Type: Task Components: documentation, website Reporter: Andrew Purtell Now as part of the release process the RM must manually update the download page (hbase.apache.org/downloads/). To accomplish that [~mdrob] says {quote} To update the download links, on master branch edit src/site/xdoc/downloads.xml After you commit and push, jenkins will build the site and publish it for you. {quote} New code lines also need a fork of the API documentation. To accomplish that: {quote} To update the API Docs and version specific reference guide, update src/site/site.xml with a new section to link to the docs in the drop down list. (only necessary the first time, but it hasn't been done yet for 1.4.x) Then git clone [https://git-wip-us.apache.org/repos/asf/hbase-site.git] and make a 1.4 directory there. Copy contents of the docs/ directory from the release tarball to the version directory. Copy target/site/devapidocs and testapidocs from a local build of the tag, since those don't get published in the release tarball. Commit your changes, then do an empty commit with message "INFRA-10751 Empty commit". Push your changes {quote} Try this out. Take notes. Update the release instructions and website publish instructions in the book accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21358) Snapshot procedure fails but SnapshotManager thinks it is still running
Andrew Purtell created HBASE-21358: -- Summary: Snapshot procedure fails but SnapshotManager thinks it is still running Key: HBASE-21358 URL: https://issues.apache.org/jira/browse/HBASE-21358 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 1.3.2 Reporter: Andrew Purtell A snapshot procedure fails due to chaotic test action but the snapshot manager still thinks it is running. The test client spins needlessly checking for something that will never actually complete. We give up eventually but we could be failing this a lot faster. On the integration client we are checking and re-checking: 2018-10-20 01:06:11,718 DEBUG [ChaosMonkeyThread] client.HBaseAdmin: Getting current status of snapshot from master... 2018-10-20 01:06:11,719 DEBUG [ChaosMonkeyThread] client.HBaseAdmin: (#40) Sleeping: 8571ms while waiting for snapshot completion. This is what it looks like on the master side each time the client checks in: 2018-10-20 01:04:54,565 DEBUG [RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=8100] master.MasterRpcServices: Checking to see if snapshot from request:{ ss=IntegrationTestBigLinkedList-it-1539997289258 table=IntegrationTestBigLinkedList type=FLUSH } is done 2018-10-20 01:04:54,565 DEBUG [RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=8100] snapshot.SnapshotManager: Snapshoting '{ ss=IntegrationTestBigLinkedList-it-1539997289258 table=IntegrationTestBigLinkedList type=FLUSH }' is still in progress! There is no running procedure for the snapshot. The procedure has failed. The snapshot manager does not take any useful action afterward but believes the snapshot to still be in progress. I see related complaint from the hfile archiver task afterward, empty directories, failure to parse protobuf in descriptor files... Seems like there was junk in the filesystem left over from the failed snapshot. The master was soon restarted by chaos action, and now I don't see these complaints, so that partially complete snapshot may have been cleaned up. This is with 1.3.2, but patched to include the multithreaded hfile archiving improvements from later versions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21359) Fix build problem against Hadoop 2.8.5
Andrew Purtell created HBASE-21359: -- Summary: Fix build problem against Hadoop 2.8.5 Key: HBASE-21359 URL: https://issues.apache.org/jira/browse/HBASE-21359 Project: HBase Issue Type: Bug Components: build Affects Versions: 1.4.8 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 1.5.0, 1.4.9 1.4.8 build fails against Hadoop 2.8.5. The fix is an easy change to supplemental-models.xml. -- This message was sent by Atlassian JIRA (v7.6.3#76005)