[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722810#comment-17722810 ] Duo Zhang commented on HBASE-15867: --- HBASE-27109 and HBASE-27110 are both resolved. If no objections, I will close all the sub tasks are won't fix, and resolve this issue as implemented. Thanks. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553265#comment-17553265 ] Duo Zhang commented on HBASE-15867: --- Created two new issues to track the replication peer storage and replication queue storage. Let's continue the work in HBASE-27109 and HBASE-27110. Let's keep this issue open as it is still the root of all the related issues. And once HBASE-27109 and HBASE-27110 are both resolved, let's get back here to see if it is OK to resolve this issue too. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550230#comment-17550230 ] Duo Zhang commented on HBASE-15867: --- Oh, when trying to implement a prototype for ReplicationLogCleaner, I found that it is not easy as expected. The basic idea of the proposed solution above, is to get the wal group of a wal file, and check if it is before or after the replication offset, to determine whether we can delete it. And if there is no offset for the group, we keep the file. There are basically two problems: 1. Every peer has its own queue, so it is not a simple 'no offset for the group'. We need to know whether there is a missing queue for a peer, if so, we should not delete it. 2. For a recovered replication queue, we will delete the queue once after we finish replicating all the remaining wal files. So for a dead region server, if there is no queue for a wal file, then usually it means we could delete it, not 'should not delete it'. Anyway, I think it is still possible to implement the cleaner logic, as we can know all the replication peers, and we can also know whether a region server is dead. But the timing will be more complicated as we need to get information from different places, and we may have race and cause we make a wrong decision on whether to delete a file. Will consider more on whether we could have simpler solutions. Thanks. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17548578#comment-17548578 ] Duo Zhang commented on HBASE-15867: --- The progress for the POC is good. https://github.com/apache9/hbase/tree/HBASE-15867 Of course there are still lots of compile errors, especially in the test code. There are some changes in the concepts mentioned above. This is the new ReplicationQueueStorage interface: https://github.com/Apache9/hbase/blob/HBASE-15867/hbase-replication/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueueStorage.java This is the most important part. {code} /** * Store the replication offset for the specific group of a replication queue. * * If the current replication queue does not exist yet, will create it automatically. * @param queueIdthe id of the replication queue * @param walGroup the wal group * @param offset the offset for a group * @param lastSeqIds map with {encodedRegionName, sequenceId} pairs for serial replication */ void setOffset(ReplicationQueueId queueId, String walGroup, ReplicationGroupOffset offset, Map lastSeqIds) throws ReplicationException; /** * Get the replication offset for all the groups of a replication queue. * * Usually used when setup a recovered replication queue. * @param queueId the id of the replication queue * @return the offset for all the groups of the given replication queue */ Map getOffsets(ReplicationQueueId queueId) throws ReplicationException; /** * Get a list of all queues for the specified region server. * @param serverName the server name of the region server that owns the set of queues * @return a list of queueIds */ List listAllQueueIds(ServerName serverName) throws ReplicationException; /** * Get a list of all region servers that have outstanding replication queues. These servers could * be alive, dead or from a previous run of the cluster. * @return a list of server names */ List listAllReplicators() throws ReplicationException; /** * Change ownership for the queue identified by queueId and belongs to a dead region server. * @param peerId the id of the replication peer * @param queueId the id of the replication queue * @param targetServerName the name of the target region server * @return the offset for all the groups of the claimed replication queue, null means someone else * has already claimed the queue. */ Map claimQueue(String peerId, ReplicationQueueId queueId, ServerName targetServerName) throws ReplicationException; /** * Remove a replication queue. * @param queueId the id for the replication queue */ void removeQueue(ReplicationQueueId queueId) throws ReplicationException; {code} There are some design changes here. One important thing is that, there is no group in the replication queue id, it is still constructed by a server name and a peer id. When getting the replication offsets, we will return the replication set of all the groups. The replication offsets for a queue will be stored in one row. The ReplicationSyncUp tool will be broken. It breaks a lot of assumptions in our design so it is not easy to make it work for now. Will consider it later. And the implementation of ClaimReplicationQueuesProcedure is not as expected. In general, maybe we do not need to list all the old wal files, but we need to list all the replication peers, and create a replication queue for each of them if not presented. I was thinking to add a replication queue when adding the peer,but considering a region server start up, you will find out that, you need to add a queue, i.e, write something to the replication queue storage, for all the existing peers, when starting a region server. This will introduce cyclic dependencies between starting a region server and make the replication queue table online... Not sure if this will cause some problems, for example, a replication queue is added after the region server is crashed, but we still create a replication queue for it and cause the wals to be replicated to peer cluster... Will consider more when implementing the POC. Thanks. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17547932#comment-17547932 ] Duo Zhang commented on HBASE-15867: --- Ah, seems our current MultiRowMutate operation can not fully support the claimQueue operation. Here we not only need to remove a row and add a row atomically, but also need to use CAS to make sure that, only one process can remove the row. Anyway, not a block for implementing a POC, we can add the support later, should be a general enough requirement. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17546010#comment-17546010 ] Duo Zhang commented on HBASE-15867: --- There are two other things which are handled by replication queue storage. First is the lastSequenceIds, which is used by serial replication. It needs to be updated together with replication offset, atomically, so we need to store it with replication offset in the same table. The key is basically a (peerId, encodedRegionName) pair, and the value is a sequence id. The second is hfile refs, for replicating bulk load hfiles. In fact, it is only used to prevent the hfiles being deleted by the HFileCleaner before being replicated, and the update to hfile refs does not need to be atomic with replication offset. Buy anyway, store it in the same table but a separated family seems no harm. Reviewing the code, for both lastSequenceIds and hfile refs, one of the requirements is to delete them as all when deleting the peer, and for hfile refs, we also need to list all the refs atomically. So the idea is to just store them in one row, with different qualifiers. To be more specific, introduce two new families, may be called replicated_seq_id and hfile_ref, and for a peer, there is only one row, where the row key is the peer id, and in replicated_seq_id, the qualifier is the encodedRegionName, and value is the sequence id, and for hfile_ref, the qualifier is the hfile name and value is just empty. In this way, a single delete families call can remove them all at once, and also, a simple get all the hfile refs for a replication peer at once. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523108#comment-17523108 ] Duo Zhang commented on HBASE-15867: --- OK, now claimQueue is driven by master, we have a ClaimReplicationQueuesProcedure, which is the last step of SCP. Then I think a possible solution is to also list all the WAL files of the dead region server, so we can know the replication queues of these WAL files, and check whether we have these queues in ReplicationQueueStorage, if not, we insert the initial replication offsets into the queue, so the region server can claim the queue. I think this is a possible way to solve the problem. Of course this will introduce a dependency on the replication queue table for SCP, but FWIW, it is already there... Theoretically, when we get to this step in SCP, all the regions on the dead region server are already onlined, so there will be no cyclic dependency. Practically, if the replication queue table is not online and we have bunch of SCPs which hang at the last claim queue step, the system may hang for a long time. So we'd better add a state check method in ReplicationQueueStorage interface, if the replication queue table is not online, then let's suspend the SCP for a while, to give other SCPs the chance to bring the replication queue table online, and also, we should have a small rpc and operation timeout when accessing the ReplicationQueueStorage in ClaimReplicationQueuesProcedure, so we will not block a PEWorker for a long time. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523101#comment-17523101 ] Duo Zhang commented on HBASE-15867: --- OK, when reviewing the implementation of ZKReplicationQueueStorage, I think the above approach still missed a part, which is the claimQueue operation. When claiming a replication queue, first we need to know there is a replication queue for the dead region server, then we can do a claim operation. So the problem here is that, we will not record the WAL files in ReplicationQueueStorage when creating the file, then if we crash before actually replicating some edits out and updating the replication offset, then we can not get the information of this replication queue through the ReplicationQueueStorage. An idea is to add a replication offset record when creating the first WAL file in a replicaion group, but it will introduce the cyclic dependency back... Let me think if there are other possible ways to solve the problem. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512711#comment-17512711 ] Duo Zhang commented on HBASE-15867: --- I've been thinking of this for a long time. Basically there are two parts of replication storage, one is for storing replication peer, the other is for storing replication queue. There are several problems here, first is we need to load all replication peers in HRegionServer.setupWALAndReplication, the second is when creating a new WAL writer, we need to record the new wal file before actually writing to it. The first one introduces a cyclic dependency on RS start up and assigning a region. If we want to store peer information in a region, then we need have a RS which could be used to assign regions to it. But anyway, since replication peer's information is not very large, and also has a low qps, I think we can store it in master local region, and let RS request master through rpc to get it, i.e, introduce a MasterReplicationPeerStorage. We need to communicate with master when starting a RS, so it does not add new dependency for region server start up, and when we want to touch the replication peer storage, it usually means we want to add/remove/modify peer, enable/disable peer, all these operations need master to be up first as we need to send request to master first, so it is not a big deal to let RS rely on master when doing these operations. And maybe we could just store the replication peer as a file on the DFS, since we will not have too much replication peers. Anyway, it is easier to be fixed comparing to the second problem. The second one introduces a cyclic dependency on assigning a region and creating WAL for the region. This could be solved by storing the replication queue information in a region which will never be replicated. For example, in HBASE-22938 we proposed to fold all system tables to hbase:meta. Or we could introduce a separated WAL instance for system tables, just like what we have done for hbase:meta. But I'm still not satisified with the above approach, that's why I still do not actually start to work on this issue. ZooKeeper is designed to be HA and its failover is pretty fast. But for HBase, if the region server which holds the region which stores the replication queue information crashes, we will hang the WAL rolling for the whole cluster for a 'long' time(usually tens of seconds or even several minutes). I think it will hurt the availability of the HBase cluster. So I spent a lot of time to think whether it is possible to not rely on replication queue storage when rolling WAL. Recently I've gotten a rough idea in my mind on how to remove the dependency so let me put it here first. I think writing a solution out could help you polish your idea and also let you know if it really works. And I also want others in the community to consider whether it works. Besides WAL rolling, we use replication queue storage at two place, one is in replication, we will get the files which need to replicate, and also record the replication progress, i.e, the offset in a file where all the entries before it have been replicated. The other is in ReplicationHFileCleaner, where we will check whether a HFile is recorded in the replication queue storage, if it is, then we should not delete it. The basic idea to avoid record every file here is that, we could sort the WAL files of a regionserver by their name, the order is exactly the orderof when the files are written. If multi-WAL is enabled, we will have several groups, but in each group, we could still sort the files. So for deleting, we only need to know the which file we are currently replicating, then we could know that all the files before this file can be deleted, and all files after this file(include this file) can not be deleted. And for replicating, we could also know which file should be replicated next after finishing replicating a file. So the solution here is that, we will consider replication offset per queue, not per file. The replication offset will be a (peer_id, regionserver, group(can be empty if no multi-WAL), file, offset_in_file) tuple. In ReplicationQueueStorage, we will only record the replication offset, without record the actual files which need to be replicated. In this way, when rolling WAL we do not need to use ReplicationQueueStorage any more. For replicating a normal replication queue, the files to be replicated is always maintained in memory, so there is no problem to just record the replication offset while replicating. For recovered replication queue, the ReplicationQueueStorage still needs to provide a claimQueue method, and for getting the actual files to be replicated, we need to list all WAL files of the given regionserver, and filter out the special group if needed, i.e, multi-WAL is enabled. The WAL files should all be
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022483#comment-17022483 ] Michael Stack commented on HBASE-15867: --- Unscheduling stalled, nice feature. Any progress here? > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917362#comment-16917362 ] stack commented on HBASE-15867: --- bq. By default I will always answer like this: the rowkey is in a different pattern and it will mess up hbase:meta Lets open issue to discuss. We can list meta info we want to keep -- acls, peer info -- and then the various ideas on implementation. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917302#comment-16917302 ] Duo Zhang commented on HBASE-15867: --- {quote} For history, could be a column family that keeps data for a week or two? In hbase:meta? {quote} By default I will always answer like this: the rowkey is in a different pattern and it will mess up hbase:meta. But considering splittable meta, it seems that I may find a way to deal with this. We could add a special prefix in the row key for different system tables, and make a special family for it. For example, for all the records in hbase:acl, we could introduce a prefix like ':::acl:::', since we do not allow ':' in either namespace or table name, so it will not conflict with the existing table related records. And the family could be namd as 'acl'. And we could make a special split policy that only splits at these special prefixs, so it will not break any assumptions so far, as all the records for the 'system table' are in the same region. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917256#comment-16917256 ] stack commented on HBASE-15867: --- For history, could be a column family that keeps data for a week or two? In hbase:meta? bq. ...what's the real issue we want to solve by moving it from ZK to HBase table? We used to have a dictum that had it that you could erase zk content and hbase would pick up and continue w/o issue. Keeping replication peer data in zk violates this principal; permanent data belongs in hbase. We've also had a general tendency in place where we would like to rely less on zk for services -- especially as it seemed like we'd gone overboard. ZK is an alien store on the other side of an RPC. All sorts of things can go wrong. We know the properties of our system better than those of zk and 'storage' is the name of our game; we shouldn't need to go elsewhere for it. Thanks. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905686#comment-16905686 ] Xu Cang commented on HBASE-15867: - Since this Jira is blocked by a [dead loop issue|https://issues.apache.org/jira/browse/HBASE-20166]. Should we take one step back and ask, what's *the real issue we want to solve* by moving it from ZK to HBase table? ( I guess we want to remove the dependency of ZK for this usecase and make saving those info in a more reliable media?) Can we tackle those issues in different ways? I tried to read thru Jira descriptions and comments to find my answer but failed. Can you shed some lights? thanks [~openinx] > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850305#comment-16850305 ] Xu Cang commented on HBASE-15867: - thank you [~openinx] very helpful! > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850297#comment-16850297 ] Zheng Hu commented on HBASE-15867: -- bq. will we track peer changes in table or we only keep the current peer information? In our original desgin, we didn't plan to store the peer changes. I guess the starting point that tracking the history changes because you want to debug the peer related problems, maybe it's better for us to enable the debug log or enhance the peer related log info. Of course, once we change to hbase:replication table we can keep all versions in it then all history changes can be accessed. Thanks. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850008#comment-16850008 ] Xu Cang commented on HBASE-15867: - thanks [~openinx]! Good info! One thing I want to discuss is, if we finish all tasks above mentioned, will we track peer changes in table or we only keep the current peer information? If no, should we also consider adding this? Peer changes are not that frequent, storing all peer changes into a DB is very useful for understanding certain system behavior. Lacking this information is a big disadvantage since we don't log peer changes into logs either. So, when someone asks "what peers did this cluster have during certain date", we just cannot answer it with proofs, which is not good. Want to know what do you think about this and do you have solution in some ways? [~openinx] thanks again. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848020#comment-16848020 ] Zheng Hu commented on HBASE-15867: -- [~xucang], Thanks for your attention. For this feature, the core obstacle is the problem described in HBASE-20166. We've also have a discussion in mail list with title: *[DISCUSS] A Problem When Start HBase Cluster Using Table Based Replication* , You can see that. Thought about the solution before, we may need to refactor the master/RS startup procedure, seems lots of change. After some evaluation we thought that the HBase 2.x stability and performance was worth spending more time at that time, so we made the priority of this feature not so high. Maybe it's the time for now :-) If you have some time now, yeah, can resume this effort. For myself, I'm mainly absorbed in HBASE-21879. Once all those subtasks get resolved , maybe I give a hand for this feature. Thanks. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847916#comment-16847916 ] Xu Cang commented on HBASE-15867: - Can someone briefly summarize what's the current state of this JIRA? Are there any architectural blockers or big concerns? Thank you! I am very interested in this Jira since it's extremely useful. If I want to resume this effort, do you hame some suggestions or tips? [~Apache9] [~openinx] > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.3.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404335#comment-16404335 ] Zheng Hu commented on HBASE-15867: -- Thanks [~Apache9] for the revert and opening a new branch for this feature, it may takes some time to fix this problem as the discussion. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.1.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403414#comment-16403414 ] Duo Zhang commented on HBASE-15867: --- As per this discussion https://lists.apache.org/thread.html/483ee387d80e9d2d914733db44e8236ca74065905efbacdb21cec615@%3Cdev.hbase.apache.org%3E I've reverted HBASE-19665 on master and created a feature branch HBASE-15867 for this feature. FYI [~openinx]. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Affects Versions: 2.1.0 >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > Fix For: 2.1.0 > > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385977#comment-16385977 ] Zheng Hu commented on HBASE-15867: -- After HBASE-19397, this issue is a quite story now ... we introduce a storage layer which is an abstract interface set for upper layer to access replication relative meta data, and the currently implementation is zookeeper storage. Later, I'll implement this storage interface for table storage.I expect that this issue to be introduced into 2.1.0 release. Besides, the previous issues we have created may be out of date now . I'll clean those issues Thanks. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph >Assignee: Zheng Hu >Priority: Major > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180459#comment-16180459 ] Ashu Pachauri commented on HBASE-15867: --- Sure [~stack] sir :) I have been busy with other stuff and don't have time right now to work on this. But, this is a very useful feature; [~openinx] thanks for picking it up and do reach out to me if you need any help. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph >Assignee: Zheng Hu > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180239#comment-16180239 ] stack commented on HBASE-15867: --- Be our guest [~openinx] Shout if we can help. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180199#comment-16180199 ] Zheng Hu commented on HBASE-15867: -- [~stack], May I have this issue ? > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16179507#comment-16179507 ] stack commented on HBASE-15867: --- Unassigned. We need this but no progress being made. Unassigning to better convey this. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408115#comment-15408115 ] Joseph commented on HBASE-15867: Hello, If you check out the latest master code, there should be a brief description of the table in the Javadocs for ReplicationTableBase. But in general, we are using an HBase table "hbase:replication" that uses a row key of the concatenated peer-id and servername. We keep two special columns, an owner column that describes which regionserver currently owns the peer and a history column that describes who has adopted the queue in the past. Finally there are just a bunch of columns mapping WAL files to their bit offsets. These columns are all under a single column family. I would assume that the HRef's would just be another column family with a structure similar to the WAL's? I think looking back, it might make sense to move the "history" and "owner" columns into their own family. Feel free to ping me if you have any other questions :) > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph >Assignee: Joseph > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407232#comment-15407232 ] Ashish Singhi commented on HBASE-15867: --- Hi [~Vegetable26]. Can you add a note about the structure of replication table and how the WALs are being tracked ? I would like to take up HBASE-16605, but would be helpful to know a brief about it before I start checking the code. If it's already been noted please point me to their. Thanks. > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph >Assignee: Joseph > > Move the WAL file and offset tracking out of ZooKeeper and into an HBase > table called hbase:replication. > The largest three new changes will be two classes ReplicationTableBase, > TableBasedReplicationQueues, and TableBasedReplicationQueuesClient. As of now > ReplicationPeers and HFileRef's tracking will not be implemented. Subtasks > have been filed for these two jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312787#comment-15312787 ] Andrew Purtell commented on HBASE-15867: Please feel free to reassign HBASE-13773 and proceed unless [~sukuna...@gmail.com] responds immediately. [~ghelmling] [~chenheng] > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph >Assignee: Joseph > > Move the WAL file and offset tracking from ZooKeeper into an HBase table for > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297716#comment-15297716 ] Heng Chen commented on HBASE-15867: --- Not sure progress of HBASE-13773 by [~sukuna...@gmail.com] > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph >Assignee: Joseph > > Move the WAL file and offset tracking from ZooKeeper into an HBase table for > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297434#comment-15297434 ] Gary Helmling commented on HBASE-15867: --- [~chenheng], [~sukuna...@gmail.com], the last update I see to HBASE-13773 is from October. Is anyone actively working on this? > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph >Assignee: Joseph > > Move the WAL file and offset tracking from ZooKeeper into an HBase table for > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297362#comment-15297362 ] Joseph commented on HBASE-15867: Oh ok, sorry I am pretty new to how this works, but what do you mean by go on / work with the other issue? [~eclark] what do you think? > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph >Assignee: Joseph > > Move the WAL file and offset tracking from ZooKeeper into an HBase table for > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15867) Move HBase replication tracking from ZooKeeper to HBase
[ https://issues.apache.org/jira/browse/HBASE-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295868#comment-15295868 ] Heng Chen commented on HBASE-15867: --- [~sukuna...@gmail.com] has said that he was on the half way, go on HBASE-13773 or work with this issue? > Move HBase replication tracking from ZooKeeper to HBase > --- > > Key: HBASE-15867 > URL: https://issues.apache.org/jira/browse/HBASE-15867 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Joseph >Assignee: Joseph > > Move the WAL file and offset tracking from ZooKeeper into an HBase table for > replication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)