[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164410#comment-16164410 ] Jan Kunigk commented on HBASE-16415: Hello, Thanks [~busbey] for kicking of the precommit job. Since I am new to the process: Are there more steps by me required at this point? Would the reviewers like me to perform additional tests? Are there more questions, or some questions from before which I should answer in more detail? Best, Jan > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > Attachments: HBASE-16415-#2.patch, HBASE-16415.patch > > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Kunigk updated HBASE-16415: --- Attachment: HBASE-16415-#2.patch Hello, I am adding a new patch after feedback from an initial review by Ted. The new patch is also available on https://reviews.apache.org/r/60896/ Summary of changes to the original patch: * tableRedirectionsMap is now populated in public void peerConfigUpdated(ReplicationPeerConfig rpc) * changing log level to WARN when unknown configuration key is found in the ReplicationPeerConfiguration * Added annotation for class RedirectingInterClusterReplicationEndpoint Best, Jan > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > Attachments: HBASE-16415-#2.patch, HBASE-16415.patch > > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088616#comment-16088616 ] Jan Kunigk commented on HBASE-16415: Hello, sorry for the delay and thanks for the feedback. I have created https://reviews.apache.org/r/60896/ With regards to the above questions (not addressed the posting on reviewboard yet): > please move the javadoc starting with... Changed > Please add annotation for audience for > RedirectingInterClusterReplicationEndpoint (Private). done > For redirectEntries(), tableRedirectionsMap is populated. This action can be > lifted outside redirectEntries() Can you point me to a good place where it should go? Are you thinking to override init() with super.init() and populate it in there? The way I understand it init may not suffice. The mapping may be changed in between, correct? > IllegalArgumentException can come from parsing val - adjust message > accordingly. This shouldn't be at INFO level. Agree, changed to LOG.warn > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > Attachments: HBASE-16415.patch > > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076870#comment-16076870 ] Jan Kunigk commented on HBASE-16415: I should also say that this has been tested with the aforementioned test class and on the hbase shell (in Pseudo Distributed more) up to now. J > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > Attachments: HBASE-16415.patch > > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Kunigk updated HBASE-16415: --- Status: Patch Available (was: In Progress) Hi, I have submitted a patch in line with what has been discussed on this JIRA. Along with RedirectingInterClusterReplicationEndpoint.java, I have implemented a test class TestRedirectedReplication.java. Some other (mostly minor or cosmetic) changes to additional classes were necessary to provide the functionality as discussed. It is possible to redirect from one table to another or from one namespace to another or both at the same time. Multiple such redirection rules can be supplied in a given peer. This also implies that 2 source tables can be replicated to a single destination table, by the way, but the opposite is of course not possible. I am looking forward to feedback, best, Jan > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > Attachments: HBASE-16415.patch > > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Kunigk updated HBASE-16415: --- Attachment: HBASE-16415.patch > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > Attachments: HBASE-16415.patch > > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-16415 started by Jan Kunigk. -- > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071292#comment-16071292 ] Jan Kunigk commented on HBASE-16415: Thanks, I also just realised, that with Option 2 we could have a user configure two redirections with the same source, which is not something that the redirection code supports in this initial patch. With Option 1, we can rule out such misconfiguration via the the Map Interfaces invariant, that it may never contain duplicate keys. J > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064699#comment-16064699 ] Jan Kunigk commented on HBASE-16415: Thanks Guanghao, I have an implementation working that successfully redirects replication in Pseudo-Distributed mode, albeit providing a hardcoded redirection configuration (in-code). Now I am implementing proper configuration for the redirection. I can think of two options to provide an external configuration for this: Option 1) is to simply pass the redirections themselves as config keys and have the RedirectingInterClusterReplicationEndpoint attempt to (also) parse each config key as a ns:tablename spec. Mapconfiguration = new Map (); configuration.put("ns1:t1", "ns2:t1"); configuration.put("ns1:tA", "ns1:tB"); This would translate into the following option to the add_peer command from the HBase shell: CONFIG => { "ns1:t1" => "ns2:t1", "ns1:tA" => "ns1:tB" } Option 2) is to explicitly name the redirections config items via a name: Map configuration = new Map (); configuration.put("redirections", "ns1:t1>>ns2:t1,ns1:tA>>ns1:tB"); This would translate into the following option to the add_peer command from the HBase shell: CONFIG => { "redirections" => "ns1:t1>>ns2:t1,ns1:tA>>ns1:tB"} Option 1) has the advantage of avoiding special parsing and characters in the configuration specification. Option 2) has the advantage of allowing unambiguous keys for the configuration, I am not sure that "namespacexyz:tableabc" (Option1) is a good choice for a configuration key. It implies that anything that is not a well-defined key is a redirection rule... Doesn't sound wise, but I did not want to exclude it... Opinions? other proposals? J > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry > entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055630#comment-16055630 ] Jan Kunigk edited comment on HBASE-16415 at 6/20/17 12:05 PM: -- I mean that, adding the method to ReplicationEndpoint would define a general interface for redirecting entries. Other Endpoints (e.g. future subclasses of RegionReplicaReplicationEndpoint, etc.) could implement a redirectEntries() method too if they want to provide this, maybe with slightly different semantics. Maybe those Endpoints would choose slightly different semantics for their own implementation of redirectEntries(). If they do not provide an implementation, then the empty default implementation from the interface will kick in. If we implement redirection within RedirectingInterClusterReplicationEndpoint.createBatches(), the concept of redirection would not be exposed beyond RedirectingInterClusterReplicationEndpoint. I do not have enough experience with the HBase codebase to weigh the merits of making this a general interface vs. encapsulating this functionality in the new endpoint entirely and will rely on your guidance. I hope I could clarify my idea though. J was (Author: jan.kun...@gmail.com): I mean that, adding the method to ReplicationEndpoint would define a general interface for redirecting entries. Other Endpoints (RegionReplicaReplicationEndpoint, or new future ones) could implement a redirectEntries() method too if they want to provide this, maybe with slightly different semantics. Maybe those Endpoints would choose slightly different semantics for their own implementation of redirectEntries(). If they do not provide an implementation, then the empty default implementation from the interface will kick in. If we implement redirection within RedirectingInterClusterReplicationEndpoint.createBatches(), the concept of redirection would not be exposed beyond RedirectingInterClusterReplicationEndpoint. I do not have enough experience with the HBase codebase to weigh the merits of making this a general interface vs. encapsulating this functionality in the new endpoint entirely and will rely on your guidance. I hope I could clarify my idea though. J > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055630#comment-16055630 ] Jan Kunigk commented on HBASE-16415: I mean that, adding the method to ReplicationEndpoint would define a general interface for redirecting entries. Other Endpoints (RegionReplicaReplicationEndpoint, or new future ones) could implement a redirectEntries() method too if they want to provide this, maybe with slightly different semantics. Maybe those Endpoints would choose slightly different semantics for their own implementation of redirectEntries(). If they do not provide an implementation, then the empty default implementation from the interface will kick in. If we implement redirection within RedirectingInterClusterReplicationEndpoint.createBatches(), the concept of redirection would not be exposed beyond RedirectingInterClusterReplicationEndpoint. I do not have enough experience with the HBase codebase to weigh the merits of making this a general interface vs. encapsulating this functionality in the new endpoint entirely and will rely on your guidance. I hope I could clarify my idea though. J > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055420#comment-16055420 ] Jan Kunigk commented on HBASE-16415: Hi Guanghao, thanks for the comment. Happy to put it into createBatches() as well, however then notion of redirection is locked into the RedirectingInterClusterReplicationEndpoint. May be the behaviour of redirecting entries is functionality, fairly common to a range of endpoints? Currently I have actually added a {code} default int redirectEntries(ReplicateContext context) { // This method should be overridden by subclasses which support redirection of source // TableNames in the target ... {code} to ReplicationEndpoint.java The invocation would not have to occur in the parent class of, but could be a general "trait" of the replication logic as part of the shipper thread shortly before replicate is called: {code} // Redirect the edits to another table in the target if the endpoint implements it and if any // redirections are configured for the current batch of edits // Returns zero if no implementation is provided int redirected = source.getReplicationEndpoint().redirectEntries(replicateContext); {code} This could give (maybe future) endpoints other than "InterCluster" type endpoints the ability to implement the same behaviour in a different context. I'd be interested about your opinion on the above points. Again, happy to override createBatches() as well. Either way, looks like we're very close to finalizing the discussion before I implement a patch :) Best, J > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053870#comment-16053870 ] Jan Kunigk commented on HBASE-16415: Thanks Ted. I am in the process of implementing the logic of entry redirection in RedirectingInterClusterReplicationEndpoint, but wonder how I should combine that class' replicate implementation with redirection. I would very much appreciate the community once more, while I see several options: 1. Embed, the logic of redirections into the replicate() method, which would mean a) Overriding replicate of HBaseInterClusterReplicationEndpoint, but also b) duplicating the actual code in HBaseInterClusterReplicationEndpoint.replicate(), which I don't think is a good implementation 2. Implement the logic of redirections in a completely new method redirect(), which needs to be called from within instances of HBaseInterClusterReplicationEndpoint or RedirectingInterClusterReplicationEndpoint We could wrap the content of replicate() into a doReplicate() method in HBaseInterClusterReplicationEndpoint. HBaseInterClusterReplicationEndpoint would then only call doReplicate() in replicate() RedirectingInterClusterReplicationEndpoint would call redirect(); doReplicate() as part of replicate() However, this solution puts the functionality of redirection way down into HBaseInterClusterReplicationEndpoint, which may not be a good idea. 3. Implement redirect() as an abstract method in HBaseReplicationEndpoint (which is the abstract base class for HBaseInterClusterReplicationEndpoint and RegionReplicaReplicationEndpoint. The redirect() implementation of both HBaseInterClusterReplicationEndpoint and RegionReplicaReplicationEndpoint would not do anything (optionally logs this). RedirectingInterClusterReplicationEndpoint.replicate() overrides the empty implementation with actually redirecting entries. While this would lead to "empty" abstract method implementations, it appears to me as the cleanest and most straightforward way to achieve the redirection logic to occur in the ReplicationEndpoint. 4. Copy the behaviour of getWALEntryfilter() in BaseReplicationEndpoint (This is more involved, but avoids empty abstract method implementations): return filters.isEmpty() ? null : new ChainWALEntryFilter(filters); 1. Implement a getRedirector() function in BaseReplicationEnpoint, which returns an instance of a novel Redirector class, implementing the redirections logic. 2. If no explicit Redirectors are set in the endpoint, return an empty redirector (NullRedirector, etc.) 3. Provide a redirectEntries() function in BaseReplicationEndpoint which will invoke getRedirector() and Redirector.redirect() _if_ getRedirector() != null 4. The version in BaseReplicationEnpoint will return an empty redirector (or null) for getRedirector(), which means that direct subclasses (HBaseInterClusterReplicationEndpoint or RegionReplicaReplicationEndpoint) will also return null and not redirect anything 5. The version in RedirectingInterClusterReplicationEndpoint will overwrite getRedirector() and return a Redirector, which implements 6. Embed redirectEntries() into the replicate() method of HBaseInterClusterReplicationEndpoint (or all other classes who want to reserve the option option of redirection) Again, appreciate your feedback, J > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050980#comment-16050980 ] Jan Kunigk commented on HBASE-16415: Yes, that makes sense, thanks for clarifying. I would extend {code} HBaseInterClusterReplicationEndpoint {code}. Any restriction / guidance on the naming? I would use {code} HBaseRedirectingInterClusterReplicationEndpoint {code} for the new subclass. J > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050466#comment-16050466 ] Jan Kunigk edited comment on HBASE-16415 at 6/15/17 1:23 PM: - Hi Guanghao, thanks for your feedback. Yes, I agree it would be better to use a new ReplicationEndpoint. The ReplicationEndpoint is passed into the run() method of ReplicaitonSource: {code} // get the WALEntryFilter from ReplicationEndpoint and add it to default filters ArrayList filters = Lists.newArrayList( (WALEntryFilter)new SystemTableWALEntryFilter()); WALEntryFilter filterFromEndpoint = this.replicationEndpoint.getWALEntryfilter(); {code} then at the end of this method a new attempt for starting the shipper thread (i.e. ReplicationSourceWALReaderThread) is launched: ``` tryStartNewShipperThread(walGroupId, queue); ``` tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a ReplicationSourceWALReaderThread and also applies all filters to it: ``` ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters); ReplicationSourceWALReaderThread walReader = new ReplicationSourceWALReaderThread(manager, replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, metrics); ``` I agree, that embedding the redirections information into a new ReplicationEndpoint just like we do with the filters makes sense. But, the inspection of the individual WAL entries would still have to occur in the Shipper Threads itself (like ``` entry = filterEntry(entry); entry = redirectEntry(entry); ``` ) Do you agree? Or am I missing something? was (Author: jan.kun...@gmail.com): Hi Guanghao, thanks for your feedback. Yes, I agree it would be better to use a new ReplicationEndpoint. The ReplicationEndpoint is passed into the run() method of ReplicaitonSource: ``` // get the WALEntryFilter from ReplicationEndpoint and add it to default filters ArrayList filters = Lists.newArrayList( (WALEntryFilter)new SystemTableWALEntryFilter()); WALEntryFilter filterFromEndpoint = this.replicationEndpoint.getWALEntryfilter(); ``` then at the end of this method a new attempt for starting the shipper thread (i.e. ReplicationSourceWALReaderThread) is launched: ``` tryStartNewShipperThread(walGroupId, queue); ``` tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a ReplicationSourceWALReaderThread and also applies all filters to it: ``` ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters); ReplicationSourceWALReaderThread walReader = new ReplicationSourceWALReaderThread(manager, replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, metrics); ``` I agree, that embedding the redirections information into a new ReplicationEndpoint just like we do with the filters makes sense. But, the inspection of the individual WAL entries would still have to occur in the Shipper Threads itself (like ``` entry = filterEntry(entry); entry = redirectEntry(entry); ``` ) Do you agree? Or am I missing something? > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050466#comment-16050466 ] Jan Kunigk edited comment on HBASE-16415 at 6/15/17 1:23 PM: - Hi Guanghao, thanks for your feedback. Yes, I agree it would be better to use a new ReplicationEndpoint. The ReplicationEndpoint is passed into the run() method of ReplicaitonSource: {code} // get the WALEntryFilter from ReplicationEndpoint and add it to default filters ArrayList filters = Lists.newArrayList( (WALEntryFilter)new SystemTableWALEntryFilter()); WALEntryFilter filterFromEndpoint = this.replicationEndpoint.getWALEntryfilter(); {code} then at the end of this method a new attempt for starting the shipper thread (i.e. ReplicationSourceWALReaderThread) is launched: {code} tryStartNewShipperThread(walGroupId, queue); {code} tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a ReplicationSourceWALReaderThread and also applies all filters to it: {code} ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters); ReplicationSourceWALReaderThread walReader = new ReplicationSourceWALReaderThread(manager, replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, metrics); {code} I agree, that embedding the redirections information into a new ReplicationEndpoint just like we do with the filters makes sense. But, the inspection of the individual WAL entries would still have to occur in the Shipper Threads itself (like {code} entry = filterEntry(entry); entry = redirectEntry(entry); {code} ) Do you agree? Or am I missing something? was (Author: jan.kun...@gmail.com): Hi Guanghao, thanks for your feedback. Yes, I agree it would be better to use a new ReplicationEndpoint. The ReplicationEndpoint is passed into the run() method of ReplicaitonSource: {code} // get the WALEntryFilter from ReplicationEndpoint and add it to default filters ArrayList filters = Lists.newArrayList( (WALEntryFilter)new SystemTableWALEntryFilter()); WALEntryFilter filterFromEndpoint = this.replicationEndpoint.getWALEntryfilter(); {code} then at the end of this method a new attempt for starting the shipper thread (i.e. ReplicationSourceWALReaderThread) is launched: ``` tryStartNewShipperThread(walGroupId, queue); ``` tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a ReplicationSourceWALReaderThread and also applies all filters to it: ``` ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters); ReplicationSourceWALReaderThread walReader = new ReplicationSourceWALReaderThread(manager, replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, metrics); ``` I agree, that embedding the redirections information into a new ReplicationEndpoint just like we do with the filters makes sense. But, the inspection of the individual WAL entries would still have to occur in the Shipper Threads itself (like ``` entry = filterEntry(entry); entry = redirectEntry(entry); ``` ) Do you agree? Or am I missing something? > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050466#comment-16050466 ] Jan Kunigk commented on HBASE-16415: Hi Guanghao, thanks for your feedback. Yes, I agree it would be better to use a new ReplicationEndpoint. The ReplicationEndpoint is passed into the run() method of ReplicaitonSource: ``` // get the WALEntryFilter from ReplicationEndpoint and add it to default filters ArrayList filters = Lists.newArrayList( (WALEntryFilter)new SystemTableWALEntryFilter()); WALEntryFilter filterFromEndpoint = this.replicationEndpoint.getWALEntryfilter(); ``` then at the end of this method a new attempt for starting the shipper thread (i.e. ReplicationSourceWALReaderThread) is launched: ``` tryStartNewShipperThread(walGroupId, queue); ``` tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a ReplicationSourceWALReaderThread and also applies all filters to it: ``` ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters); ReplicationSourceWALReaderThread walReader = new ReplicationSourceWALReaderThread(manager, replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, metrics); ``` I agree, that embedding the redirections information into a new ReplicationEndpoint just like we do with the filters makes sense. But, the inspection of the individual WAL entries would still have to occur in the Shipper Threads itself (like ``` entry = filterEntry(entry); entry = redirectEntry(entry); ``` ) Do you agree? Or am I missing something? > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi >Assignee: Jan Kunigk > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044353#comment-16044353 ] Jan Kunigk commented on HBASE-16415: Can someone assign me please? I do not seem to have that privilege. > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry> entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16415) Replication in different namespace
[ https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044348#comment-16044348 ] Jan Kunigk commented on HBASE-16415: Hi, I would like to make the following contribution to this, looking very much forward to feedback and comments: ReplicationSourceWALReaderThread continuously follows WALEntries to be replicated for a specified WAL via WAL.Reader's next() method and adds them to WALEntryBatches As far as I can see, those WALEntries are copies of the originally persisted local WALs. In order to direct these Entries to TableNames, different to the source, I propose to intercept the copied WALEntries on the source cluster and probe if they belong to a TableName, which is to be re-written. If such a probe is successful, then the WALKey of any such WALEntry needs to be changed accordingly. WALKey provides a getTableName() method, but currently not a setTableName() method, which would simply have to be added to change the private TableName member. I propose to intercept the entries via a new method redirectEntry(), which is invoked shortly before the entry is added to its WALEntryBatch and immediately after the entry has been filtered by filterEntry() like so: Entry entry = entryStream.next(); if (updateSerialReplPos(batch, entry)) { batch.lastWalPosition = entryStream.getPosition(); break; } entry = filterEntry(entry); entry = redirectEntry(entry); // <-- if (entry != null) { WALEdit edit = entry.getEdit(); if (edit != null && !edit.isEmpty()) { long entrySize = getEntrySize(entry); batch.addEntry(entry); redirectEntry() bases its decisions on a 'Mapredirections', where the keys are the source table name and the values the destination table name. The Map would be included in the ReplicationPeerConfig, which can be obtained from within ReplicationSourceWALReaderThread via the instance of ReplicationSourceManager, which is in turn passed as an argument to both available constructors. When a TableName object from a WALKey from the WALEntryStream matches the key of any of the entries in the redirections map, that WALKey's TableName is replaced by the the value of that entry. The rationale for intercepting on the sending side is that the setup and peer management is performed on the source today already and there is no mechanism I can see which would carry the redirection rules themselves across. Similarly to the way that the hbase shell allows to specify the tables and column families to be replicated (set_peer_table_CFs), I propose a new command (also on the sending side) 'set_peer_table_redirections', which accepts a map of Strings, corresponding to the required final specification of the redirections as TableNames: set_peer_redirections['ns_source1:table_source1' : 'ns_dest1:table_dest1', 'ns_source2:table_source2' : 'ns_dest2:table_dest2', ... 'ns_sourcen:table_sourcen' : 'ns_destn:table_destn', ] Best, J > Replication in different namespace > -- > > Key: HBASE-16415 > URL: https://issues.apache.org/jira/browse/HBASE-16415 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Christian Guegi > > It would be nice to replicate tables from one namespace to another namespace. > Example: > Master cluster, namespace=default, table=bar > Slave cluster, namespace=dr, table=bar > Replication happens in class ReplicationSink: > public void replicateEntries(List entries, final CellScanner > cells, ...){ > ... > TableName table = > TableName.valueOf(entry.getKey().getTableName().toByteArray()); > ... > addToHashMultiMap(rowMap, table, clusterIds, m); > ... > for (Entry > entry : > rowMap.entrySet()) { > batch(entry.getKey(), entry.getValue().values()); > } >} -- This message was sent by Atlassian JIRA (v6.3.15#6346)