[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-09-13 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164410#comment-16164410
 ] 

Jan Kunigk commented on HBASE-16415:


Hello,

Thanks [~busbey] for kicking of the precommit job.

Since I am new to the process: Are there more steps by me required at this 
point?
Would the reviewers like me to perform additional tests?
Are there more questions, or some questions from before which I should answer 
in more detail?

Best, Jan 

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
> Attachments: HBASE-16415-#2.patch, HBASE-16415.patch
>
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-16415) Replication in different namespace

2017-08-18 Thread Jan Kunigk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Kunigk updated HBASE-16415:
---
Attachment: HBASE-16415-#2.patch

Hello, I am adding a new patch after feedback from an initial review by Ted.
The new patch is also available on https://reviews.apache.org/r/60896/

Summary of changes to the original patch:
  * tableRedirectionsMap is now populated in public void 
peerConfigUpdated(ReplicationPeerConfig rpc)
  * changing log level to WARN when unknown configuration key is found in the 
ReplicationPeerConfiguration
  * Added annotation for class RedirectingInterClusterReplicationEndpoint

Best, Jan


> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
> Attachments: HBASE-16415-#2.patch, HBASE-16415.patch
>
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-07-15 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088616#comment-16088616
 ] 

Jan Kunigk commented on HBASE-16415:


Hello, sorry for the delay and thanks for the feedback. I have created 
https://reviews.apache.org/r/60896/

With regards to the above questions (not addressed the posting on reviewboard 
yet):

> please move the javadoc starting with...
Changed

> Please add annotation for audience for 
> RedirectingInterClusterReplicationEndpoint (Private).
done

> For redirectEntries(), tableRedirectionsMap is populated. This action can be 
> lifted outside redirectEntries() 
Can you point me to a good place where it should go? Are you thinking to 
override init() with super.init() and populate it in there? The way I 
understand it init may not suffice. The mapping may be changed in between, 
correct? 

> IllegalArgumentException can come from parsing val - adjust message 
> accordingly. This shouldn't be at INFO level.
Agree, changed to LOG.warn



> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
> Attachments: HBASE-16415.patch
>
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-07-06 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076870#comment-16076870
 ] 

Jan Kunigk commented on HBASE-16415:


I should also say that this has been tested with the aforementioned test class 
and on the hbase shell (in Pseudo Distributed more) up to now.

J 

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
> Attachments: HBASE-16415.patch
>
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-16415) Replication in different namespace

2017-07-06 Thread Jan Kunigk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Kunigk updated HBASE-16415:
---
Status: Patch Available  (was: In Progress)

Hi, I have submitted a patch in line with what has been discussed on this JIRA.
Along with RedirectingInterClusterReplicationEndpoint.java,
I have implemented a test class TestRedirectedReplication.java.

Some other (mostly minor or cosmetic) changes to additional classes were 
necessary to provide the functionality as discussed.

It is possible to redirect from one table to another or from one namespace to 
another or both at the same time. Multiple such redirection rules can be 
supplied in a given peer.
This also implies that 2 source tables can be replicated to a single 
destination table, by the way, but the opposite is of course not possible.

I am looking forward to feedback, best, Jan 

 

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
> Attachments: HBASE-16415.patch
>
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-16415) Replication in different namespace

2017-07-06 Thread Jan Kunigk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Kunigk updated HBASE-16415:
---
Attachment: HBASE-16415.patch

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
> Attachments: HBASE-16415.patch
>
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HBASE-16415) Replication in different namespace

2017-07-06 Thread Jan Kunigk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-16415 started by Jan Kunigk.
--
> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-07-01 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16071292#comment-16071292
 ] 

Jan Kunigk commented on HBASE-16415:


Thanks, I also just realised, that with Option 2 we could have a user configure 
two redirections with the same source, which is not something that the 
redirection code supports in this initial patch.

With Option 1, we can rule out such misconfiguration via the the Map Interfaces 
invariant, that it may never contain duplicate keys.

J 

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-06-27 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16064699#comment-16064699
 ] 

Jan Kunigk commented on HBASE-16415:


Thanks Guanghao,

I have an implementation working that successfully redirects replication in 
Pseudo-Distributed mode, albeit providing a hardcoded redirection configuration 
(in-code).
Now I am implementing proper configuration for the redirection.
I can think of two options to provide an external configuration for this:

Option 1) is to simply pass the redirections themselves as config keys and have 
the RedirectingInterClusterReplicationEndpoint attempt to (also) parse each 
config key as a ns:tablename spec. 
Map configuration = new Map();
configuration.put("ns1:t1", "ns2:t1");
configuration.put("ns1:tA", "ns1:tB");
This would translate into the following option to the add_peer command from the 
HBase shell: CONFIG => { "ns1:t1" => "ns2:t1", "ns1:tA" => "ns1:tB" }

Option 2) is to explicitly name the redirections config items via a name:
Map configuration = new Map();
configuration.put("redirections", "ns1:t1>>ns2:t1,ns1:tA>>ns1:tB");
This would translate into the following option to the add_peer command from the 
HBase shell: CONFIG => { "redirections" => "ns1:t1>>ns2:t1,ns1:tA>>ns1:tB"}

Option 1) has the advantage of avoiding special parsing and characters in the 
configuration specification.
Option 2) has the advantage of allowing unambiguous keys for the configuration, 
I am not sure that "namespacexyz:tableabc" (Option1) is a good choice for a 
configuration key. It implies that anything that is not a well-defined key is a 
redirection rule... Doesn't sound wise, but I did not want to exclude it...

Opinions? other proposals?

J


> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-16415) Replication in different namespace

2017-06-20 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055630#comment-16055630
 ] 

Jan Kunigk edited comment on HBASE-16415 at 6/20/17 12:05 PM:
--

I mean that, adding the method to ReplicationEndpoint would define a general 
interface for redirecting entries.
Other Endpoints (e.g. future subclasses of RegionReplicaReplicationEndpoint, 
etc.) could implement a redirectEntries() method too if they want to provide 
this, maybe with slightly different semantics.
Maybe those Endpoints would choose slightly different semantics for their own 
implementation of redirectEntries().
If they do not provide an implementation, then the empty default implementation 
from the interface will kick in.

If we implement redirection within 
RedirectingInterClusterReplicationEndpoint.createBatches(), the concept of 
redirection would not be exposed beyond 
RedirectingInterClusterReplicationEndpoint.

I do not have enough experience with the HBase codebase to weigh the merits of 
making this a general interface vs. encapsulating this functionality in the new 
endpoint entirely and will rely on your guidance.

I hope I could clarify my idea though.

J





was (Author: jan.kun...@gmail.com):
I mean that, adding the method to ReplicationEndpoint would define a general 
interface for redirecting entries.
Other Endpoints (RegionReplicaReplicationEndpoint, or new future ones) could 
implement a redirectEntries() method too if they want to provide this, maybe 
with slightly different semantics.
Maybe those Endpoints would choose slightly different semantics for their own 
implementation of redirectEntries().
If they do not provide an implementation, then the empty default implementation 
from the interface will kick in.

If we implement redirection within 
RedirectingInterClusterReplicationEndpoint.createBatches(), the concept of 
redirection would not be exposed beyond 
RedirectingInterClusterReplicationEndpoint.

I do not have enough experience with the HBase codebase to weigh the merits of 
making this a general interface vs. encapsulating this functionality in the new 
endpoint entirely and will rely on your guidance.

I hope I could clarify my idea though.

J




> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-06-20 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055630#comment-16055630
 ] 

Jan Kunigk commented on HBASE-16415:


I mean that, adding the method to ReplicationEndpoint would define a general 
interface for redirecting entries.
Other Endpoints (RegionReplicaReplicationEndpoint, or new future ones) could 
implement a redirectEntries() method too if they want to provide this, maybe 
with slightly different semantics.
Maybe those Endpoints would choose slightly different semantics for their own 
implementation of redirectEntries().
If they do not provide an implementation, then the empty default implementation 
from the interface will kick in.

If we implement redirection within 
RedirectingInterClusterReplicationEndpoint.createBatches(), the concept of 
redirection would not be exposed beyond 
RedirectingInterClusterReplicationEndpoint.

I do not have enough experience with the HBase codebase to weigh the merits of 
making this a general interface vs. encapsulating this functionality in the new 
endpoint entirely and will rely on your guidance.

I hope I could clarify my idea though.

J




> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-06-20 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055420#comment-16055420
 ] 

Jan Kunigk commented on HBASE-16415:


Hi Guanghao,

thanks for the comment. Happy to put it into createBatches() as well, however 
then notion of redirection is locked into the 
RedirectingInterClusterReplicationEndpoint.
May be the behaviour of redirecting entries is functionality, fairly common to 
a range of endpoints?

Currently I have actually added a 
{code}
  default int redirectEntries(ReplicateContext context) {
// This method should be overridden by subclasses which support redirection 
of source
// TableNames in the target
...
{code}
to ReplicationEndpoint.java

The invocation would not have to occur in the parent class of, but could be a 
general "trait" of the replication logic as part of the shipper thread shortly 
before replicate is called:
{code}
// Redirect the edits to another table in the target if the endpoint 
implements it and if any
// redirections are configured for the current batch of edits
// Returns zero if no implementation is provided
int redirected = 
source.getReplicationEndpoint().redirectEntries(replicateContext);
{code}

This could give (maybe future) endpoints other than "InterCluster" type 
endpoints the ability to implement the same behaviour in a different context.

I'd be interested about your opinion on the above points.
Again, happy to override createBatches() as well.
Either way, looks like we're very close to finalizing the discussion before I 
implement a patch :)

Best, J





> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-06-19 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053870#comment-16053870
 ] 

Jan Kunigk commented on HBASE-16415:


Thanks Ted.

I am in the process of implementing the logic of entry redirection in 
RedirectingInterClusterReplicationEndpoint, but wonder how I should combine 
that class' replicate implementation with redirection.
I would very much appreciate the community once more, while I see several 
options:

1. Embed, the logic of redirections into the replicate() method, which would 
mean
a) Overriding replicate of HBaseInterClusterReplicationEndpoint, but also
b) duplicating the actual code in 
HBaseInterClusterReplicationEndpoint.replicate(), which I don't think is a good 
implementation

2. Implement the logic of redirections in a completely new method redirect(), 
which needs to be called from within instances of 
HBaseInterClusterReplicationEndpoint or 
RedirectingInterClusterReplicationEndpoint
We could wrap the content of replicate() into a doReplicate() method in 
HBaseInterClusterReplicationEndpoint.
HBaseInterClusterReplicationEndpoint would then only call doReplicate() in 
replicate()
RedirectingInterClusterReplicationEndpoint would call redirect(); doReplicate() 
as part of replicate()
However, this solution puts the functionality of redirection way down into 
HBaseInterClusterReplicationEndpoint, which may not be a good idea.

3. Implement redirect() as an abstract method in HBaseReplicationEndpoint 
(which is the abstract base class for HBaseInterClusterReplicationEndpoint and 
RegionReplicaReplicationEndpoint.
The redirect() implementation of both HBaseInterClusterReplicationEndpoint and 
RegionReplicaReplicationEndpoint would not do anything (optionally logs this). 
RedirectingInterClusterReplicationEndpoint.replicate() overrides the empty 
implementation with actually redirecting entries.
While this would lead to "empty" abstract method implementations, it appears to 
me as the cleanest and most straightforward way to achieve the redirection 
logic to occur in the ReplicationEndpoint.

4. Copy the behaviour of getWALEntryfilter() in BaseReplicationEndpoint (This 
is more involved, but avoids empty abstract method implementations):
return filters.isEmpty() ? null : new ChainWALEntryFilter(filters);
1. Implement a getRedirector() function in BaseReplicationEnpoint, which 
returns an instance of a novel Redirector class, implementing the redirections 
logic.
2. If no explicit Redirectors are set in the endpoint, return an empty 
redirector (NullRedirector, etc.)
3. Provide a redirectEntries() function in BaseReplicationEndpoint which 
will invoke getRedirector() and Redirector.redirect() _if_ getRedirector() != 
null
4. The version in BaseReplicationEnpoint will return an empty redirector 
(or null) for getRedirector(), which means that direct subclasses 
(HBaseInterClusterReplicationEndpoint or RegionReplicaReplicationEndpoint) will 
also return null and not redirect anything
5. The version in RedirectingInterClusterReplicationEndpoint will overwrite 
getRedirector() and return a Redirector, which implements 
6. Embed redirectEntries() into the replicate() method of 
HBaseInterClusterReplicationEndpoint (or all other classes who want to reserve 
the option option of redirection) 


Again, appreciate your feedback, J

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-06-15 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050980#comment-16050980
 ] 

Jan Kunigk commented on HBASE-16415:


Yes, that makes sense, thanks for clarifying.
I would extend {code} HBaseInterClusterReplicationEndpoint {code}.

Any restriction / guidance on the naming? I would use {code} 
HBaseRedirectingInterClusterReplicationEndpoint {code} for the new subclass.

J

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-16415) Replication in different namespace

2017-06-15 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050466#comment-16050466
 ] 

Jan Kunigk edited comment on HBASE-16415 at 6/15/17 1:23 PM:
-

Hi Guanghao, thanks for your feedback.
Yes, I agree it would be better to use a new ReplicationEndpoint.

The ReplicationEndpoint is passed into the run() method of ReplicaitonSource:
{code}
// get the WALEntryFilter from ReplicationEndpoint and add it to default 
filters
ArrayList filters = Lists.newArrayList(
  (WALEntryFilter)new SystemTableWALEntryFilter());
WALEntryFilter filterFromEndpoint = 
this.replicationEndpoint.getWALEntryfilter();
{code}

then at the end of this method a new attempt for starting the shipper thread 
(i.e. ReplicationSourceWALReaderThread) is launched:
```
  tryStartNewShipperThread(walGroupId, queue);
```
tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a 
ReplicationSourceWALReaderThread and also applies all filters to it:
```
   ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters);

ReplicationSourceWALReaderThread walReader = new 
ReplicationSourceWALReaderThread(manager,
replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, 
metrics);
```
I agree, that embedding the redirections information into a new 
ReplicationEndpoint just like we do with the filters makes sense.
But, the inspection of the individual WAL entries would still have to occur in 
the Shipper Threads itself
(like
```
entry = filterEntry(entry);
entry = redirectEntry(entry);
```
)

Do you agree? Or am I missing something? 


was (Author: jan.kun...@gmail.com):
Hi Guanghao, thanks for your feedback.
Yes, I agree it would be better to use a new ReplicationEndpoint.

The ReplicationEndpoint is passed into the run() method of ReplicaitonSource:
```
// get the WALEntryFilter from ReplicationEndpoint and add it to default 
filters
ArrayList filters = Lists.newArrayList(
  (WALEntryFilter)new SystemTableWALEntryFilter());
WALEntryFilter filterFromEndpoint = 
this.replicationEndpoint.getWALEntryfilter();
```
then at the end of this method a new attempt for starting the shipper thread 
(i.e. ReplicationSourceWALReaderThread) is launched:
```
  tryStartNewShipperThread(walGroupId, queue);
```
tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a 
ReplicationSourceWALReaderThread and also applies all filters to it:
```
   ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters);

ReplicationSourceWALReaderThread walReader = new 
ReplicationSourceWALReaderThread(manager,
replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, 
metrics);
```
I agree, that embedding the redirections information into a new 
ReplicationEndpoint just like we do with the filters makes sense.
But, the inspection of the individual WAL entries would still have to occur in 
the Shipper Threads itself
(like
```
entry = filterEntry(entry);
entry = redirectEntry(entry);
```
)

Do you agree? Or am I missing something? 

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-16415) Replication in different namespace

2017-06-15 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050466#comment-16050466
 ] 

Jan Kunigk edited comment on HBASE-16415 at 6/15/17 1:23 PM:
-

Hi Guanghao, thanks for your feedback.
Yes, I agree it would be better to use a new ReplicationEndpoint.

The ReplicationEndpoint is passed into the run() method of ReplicaitonSource:
{code}
// get the WALEntryFilter from ReplicationEndpoint and add it to default 
filters
ArrayList filters = Lists.newArrayList(
  (WALEntryFilter)new SystemTableWALEntryFilter());
WALEntryFilter filterFromEndpoint = 
this.replicationEndpoint.getWALEntryfilter();
{code}

then at the end of this method a new attempt for starting the shipper thread 
(i.e. ReplicationSourceWALReaderThread) is launched:
{code}
  tryStartNewShipperThread(walGroupId, queue);
{code}
tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a 
ReplicationSourceWALReaderThread and also applies all filters to it:
{code}
   ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters);

ReplicationSourceWALReaderThread walReader = new 
ReplicationSourceWALReaderThread(manager,
replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, 
metrics);
{code}
I agree, that embedding the redirections information into a new 
ReplicationEndpoint just like we do with the filters makes sense.
But, the inspection of the individual WAL entries would still have to occur in 
the Shipper Threads itself
(like
{code}
entry = filterEntry(entry);
entry = redirectEntry(entry);
{code}
)

Do you agree? Or am I missing something? 


was (Author: jan.kun...@gmail.com):
Hi Guanghao, thanks for your feedback.
Yes, I agree it would be better to use a new ReplicationEndpoint.

The ReplicationEndpoint is passed into the run() method of ReplicaitonSource:
{code}
// get the WALEntryFilter from ReplicationEndpoint and add it to default 
filters
ArrayList filters = Lists.newArrayList(
  (WALEntryFilter)new SystemTableWALEntryFilter());
WALEntryFilter filterFromEndpoint = 
this.replicationEndpoint.getWALEntryfilter();
{code}

then at the end of this method a new attempt for starting the shipper thread 
(i.e. ReplicationSourceWALReaderThread) is launched:
```
  tryStartNewShipperThread(walGroupId, queue);
```
tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a 
ReplicationSourceWALReaderThread and also applies all filters to it:
```
   ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters);

ReplicationSourceWALReaderThread walReader = new 
ReplicationSourceWALReaderThread(manager,
replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, 
metrics);
```
I agree, that embedding the redirections information into a new 
ReplicationEndpoint just like we do with the filters makes sense.
But, the inspection of the individual WAL entries would still have to occur in 
the Shipper Threads itself
(like
```
entry = filterEntry(entry);
entry = redirectEntry(entry);
```
)

Do you agree? Or am I missing something? 

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-06-15 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050466#comment-16050466
 ] 

Jan Kunigk commented on HBASE-16415:


Hi Guanghao, thanks for your feedback.
Yes, I agree it would be better to use a new ReplicationEndpoint.

The ReplicationEndpoint is passed into the run() method of ReplicaitonSource:
```
// get the WALEntryFilter from ReplicationEndpoint and add it to default 
filters
ArrayList filters = Lists.newArrayList(
  (WALEntryFilter)new SystemTableWALEntryFilter());
WALEntryFilter filterFromEndpoint = 
this.replicationEndpoint.getWALEntryfilter();
```
then at the end of this method a new attempt for starting the shipper thread 
(i.e. ReplicationSourceWALReaderThread) is launched:
```
  tryStartNewShipperThread(walGroupId, queue);
```
tryStartNewShipperThread() is invoking startNewWALReaderThread, which returns a 
ReplicationSourceWALReaderThread and also applies all filters to it:
```
   ChainWALEntryFilter readerFilter = new ChainWALEntryFilter(filters);

ReplicationSourceWALReaderThread walReader = new 
ReplicationSourceWALReaderThread(manager,
replicationQueueInfo, queue, startPosition, fs, conf, readerFilter, 
metrics);
```
I agree, that embedding the redirections information into a new 
ReplicationEndpoint just like we do with the filters makes sense.
But, the inspection of the individual WAL entries would still have to occur in 
the Shipper Threads itself
(like
```
entry = filterEntry(entry);
entry = redirectEntry(entry);
```
)

Do you agree? Or am I missing something? 

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>Assignee: Jan Kunigk
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-06-09 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044353#comment-16044353
 ] 

Jan Kunigk commented on HBASE-16415:


Can someone assign me please? I do not seem to have that privilege.
 

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16415) Replication in different namespace

2017-06-09 Thread Jan Kunigk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044348#comment-16044348
 ] 

Jan Kunigk commented on HBASE-16415:


Hi, I would like to make the following
contribution to this, looking very much forward to feedback and comments:

ReplicationSourceWALReaderThread continuously follows WALEntries to be
replicated for a specified WAL via WAL.Reader's next() method and adds them
to WALEntryBatches

As far as I can see, those WALEntries are copies of the originally
persisted local WALs. In order to direct these Entries to TableNames,
different to the source, I propose to intercept the copied WALEntries on
the source cluster and probe if they belong to a TableName, which is to be
re-written.

If such a probe is successful, then the WALKey of any such WALEntry needs
to be changed accordingly. WALKey provides a getTableName() method, but
currently not a setTableName() method, which would simply have to be added
to change the private TableName member.

I propose to intercept the entries via a new method redirectEntry(), which
is invoked shortly before the entry is added to its WALEntryBatch and
immediately after the entry has been filtered by filterEntry() like so:

Entry entry = entryStream.next();
if (updateSerialReplPos(batch, entry)) {
  batch.lastWalPosition = entryStream.getPosition();
  break;
}
entry = filterEntry(entry);
entry = redirectEntry(entry); // <--
if (entry != null) {
  WALEdit edit = entry.getEdit();
  if (edit != null && !edit.isEmpty()) {
long entrySize = getEntrySize(entry);
batch.addEntry(entry);

redirectEntry() bases its decisions on a 'Map
redirections', where the keys are the source table name and the values the
destination table name. The Map would be included in the
ReplicationPeerConfig, which can be obtained from within
ReplicationSourceWALReaderThread via the instance of
ReplicationSourceManager, which is in turn passed as an argument to both
available constructors.

When a TableName object from a WALKey from the WALEntryStream matches the
key of any of the entries in the redirections map, that WALKey's TableName
is replaced by the the value of that entry.

The rationale for intercepting on the sending side is that the setup and
peer management is performed on the source today already and there is no
mechanism I can see which would carry the redirection rules themselves
across.

Similarly to the way that the hbase shell allows to specify the tables and
column families to be replicated (set_peer_table_CFs), I propose a new
command (also on the sending side) 'set_peer_table_redirections', which
accepts a map of Strings, corresponding to the required final specification
of the redirections as TableNames:

set_peer_redirections['ns_source1:table_source1' : 'ns_dest1:table_dest1',
'ns_source2:table_source2' : 'ns_dest2:table_dest2', ...
'ns_sourcen:table_sourcen' : 'ns_destn:table_destn', ]

Best, J

> Replication in different namespace
> --
>
> Key: HBASE-16415
> URL: https://issues.apache.org/jira/browse/HBASE-16415
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: Christian Guegi
>
> It would be nice to replicate tables from one namespace to another namespace.
> Example:
> Master cluster, namespace=default, table=bar
> Slave cluster, namespace=dr, table=bar
> Replication happens in class ReplicationSink:
>   public void replicateEntries(List entries, final CellScanner 
> cells, ...){
> ...
> TableName table = 
> TableName.valueOf(entry.getKey().getTableName().toByteArray());
> ...
> addToHashMultiMap(rowMap, table, clusterIds, m);
> ...
> for (Entry> entry : 
> rowMap.entrySet()) {
>   batch(entry.getKey(), entry.getValue().values());
> }
>}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)