[jira] [Created] (HBASE-28567) Race condition causes MetaRegionLocationCache to never set watcher to populate meta location
Vincent Poon created HBASE-28567: Summary: Race condition causes MetaRegionLocationCache to never set watcher to populate meta location Key: HBASE-28567 URL: https://issues.apache.org/jira/browse/HBASE-28567 Project: HBase Issue Type: Bug Affects Versions: 2.5.8, 3.0.0 Reporter: Vincent Poon Assignee: Vincent Poon {{ZKWatcher#getMetaReplicaNodesAndWatchChildren()}} attempts to set a a watch on the base /hbase znode children using {{ZKUtil.listChildrenAndWatchForNewChildren()}}, but if the node does not exist, no watch gets set. We've seen this in the test container Trino uses over at [trino/21569|https://github.com/trinodb/trino/pull/21569] , where ZK, master, and RS are all run in the same container. The fix is to throw if the node does not exist so that {{MetaRegionLocationCache}} can retry until the node gets created. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-20034) Make periodic flusher delay configurable
Vincent Poon created HBASE-20034: Summary: Make periodic flusher delay configurable Key: HBASE-20034 URL: https://issues.apache.org/jira/browse/HBASE-20034 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 3.0.0 Reporter: Vincent Poon Assignee: Vincent Poon PeriodicMemstoreFlusher is currently configured to flush with a random delay of up to 5 minutes. Make this configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-18060) Backport to branch-1 HBASE-9774 HBase native metrics and metric collection for coprocessors
Vincent Poon created HBASE-18060: Summary: Backport to branch-1 HBASE-9774 HBase native metrics and metric collection for coprocessors Key: HBASE-18060 URL: https://issues.apache.org/jira/browse/HBASE-18060 Project: HBase Issue Type: New Feature Affects Versions: 1.4.0, 1.3.2, 1.5.0 Reporter: Vincent Poon Assignee: Vincent Poon I'd like to explore backporting HBASE-9774 to branch-1, as the ability for coprocessors to report custom metrics through HBase is useful for us, and if we have coprocessors use the native API, a re-write won't be necessary after an upgrade to 2.0. The main issues I see so far are: - the usage of Java 8 language features. Seems we can work around this as most of it is syntactic sugar - dropwizard 3.1.2 in Master. branch-1 is still on yammer metrics 2.2. Not sure if these can coexist just for this feature -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18026) ProtobufUtil seems to do extra array copying
Vincent Poon created HBASE-18026: Summary: ProtobufUtil seems to do extra array copying Key: HBASE-18026 URL: https://issues.apache.org/jira/browse/HBASE-18026 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.3.2 Reporter: Vincent Poon Priority: Minor In ProtobufUtil, the protobuf fields are copied into an array using toByteArray(). These are then passed into the KeyValue constructor which does another copy. It seems like we can avoid a copy here by using HBaseZeroCopyByteString#zeroCopyGetBytes() ? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17341) Add a timeout during replication endpoint termination
Vincent Poon created HBASE-17341: Summary: Add a timeout during replication endpoint termination Key: HBASE-17341 URL: https://issues.apache.org/jira/browse/HBASE-17341 Project: HBase Issue Type: Bug Affects Versions: 1.2.4, 0.98.23, 1.1.7, 2.0.0, 1.3.0, 1.4.0 Reporter: Vincent Poon Priority: Critical In ReplicationSource#terminate(), a Future is obtained from ReplicationEndpoint#stop(). Future.get() is then called, but can potentially hang there if something went wrong in the endpoint stop(). Hanging there has serious implications, because the thread could potentially be the ZK event thread (e.g. watcher calls ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> blocked). This means no other events in the ZK event queue will get processed, which for HBase means other ZK watches such as replication watch notifications, snapshot watch notifications, even RegionServer shutdown will all get blocked. The short term fix addressed here is to simply add a timeout for Future.get(). But the severe consequences seen here perhaps suggest a broader refactoring of the ZKWatcher usage in HBase is in order, to protect against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17328) Properly dispose of looped replication peers
Vincent Poon created HBASE-17328: Summary: Properly dispose of looped replication peers Key: HBASE-17328 URL: https://issues.apache.org/jira/browse/HBASE-17328 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.23, 2.0.0, 1.4.0 Reporter: Vincent Poon When adding a looped replication peer (clusterId == peerClusterId), the following code terminates the replication source thread, but since the source manager still holds a reference, WALs continue to get enqueued, and never get cleaned because they're stuck in the queue, leading to an unsustainable buildup. Furthermore, the replication statistics thread will continue to print statistics for the terminated source. {code} if (clusterId.equals(peerClusterId) && !replicationEndpoint.canReplicateToSameCluster()) { this.terminate("ClusterId " + clusterId + " is replicating to itself: peerClusterId " + peerClusterId + " which is not allowed by ReplicationEndpoint:" + replicationEndpoint.getClass().getName(), null, false); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15995) Separate replication WAL reading from shipping
Vincent Poon created HBASE-15995: Summary: Separate replication WAL reading from shipping Key: HBASE-15995 URL: https://issues.apache.org/jira/browse/HBASE-15995 Project: HBase Issue Type: Sub-task Components: Replication Affects Versions: 2.0.0 Reporter: Vincent Poon Currently ReplicationSource reads edits from the WAL and ships them in the same thread. By breaking out the reading from the shipping, we can introduce greater parallelism and lay the foundation for further refactoring to a pipelined, streaming model. -- This message was sent by Atlassian JIRA (v6.3.4#6332)