[jira] [Commented] (HBASE-10504) Define Replication Interface

2017-09-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171133#comment-16171133
 ] 

stack commented on HBASE-10504:
---

All subtasks in this issue are done. A Replication Interface has been 
implemented. Resolving.

But first, on rereading the clean discussion above, this implementation has 
little for the lily/SEP/hbase-indexer "fake cluster" case. The lily-folks 
reasonably want to index in a separate process. They want to be able to scale 
up and down this consumer independent of RegionServer count and they don't want 
to have to implement our rpc, security or serde.

In "Gabriel Reid added a comment - 11/Feb/14 23:59", he states that 
hbase-indexer makes use of o.a.h.h.replication.regionserver.ReplicationSource 
and o.a.h.h.Server, as well as the generated AdminProtos. He continued: " The 
one thing I can think of right now that would be nice is a better hook into 
o.a.h.h.replication.regionserver.ReplicationSource#removeNonReplicableEdits. 
Right now we override that method to do additional filtering on HLog entries 
that we know we don't want do send to the indexer"

This latter asked-for-functionality, a sender-side filtering mechanism, was 
added in a subtask of this issue, HBASE-11367, where you can define 
implementations of WALEntryFilter when you define a ReplicationEndpoint (The 
current SEP has actually stopped filtering source-side because breakage was 
common).

That leaves all the rest: the standing up of an hbase rpcserver instance that 
floats an Admin Service doing a hokey register of the RegionServer ephemeral 
znode up in zk.

As suggested above, lets open a distinct issue for helping out the 
hbase-indexer case.

> Define Replication Interface
> 
>
> Key: HBASE-10504
> URL: https://issues.apache.org/jira/browse/HBASE-10504
> Project: HBase
>  Issue Type: Task
>  Components: Replication
>Reporter: stack
>Assignee: Enis Soztutar
>Priority: Blocker
> Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch
>
>
> HBase has replication.  Fellas have been hijacking the replication apis to do 
> all kinds of perverse stuff like indexing hbase content (hbase-indexer 
> https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
> overrides that replicate via an alternate channel (over a secure thrift 
> channel between dcs over on HBASE-9360).  This issue is about surfacing these 
> APIs as public with guarantees to downstreamers similar to those we have on 
> our public client-facing APIs (and so we don't break them for downstreamers).
> Any input [~phunt] or [~gabriel.reid] or [~toffer]?
> Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2017-08-31 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149335#comment-16149335
 ] 

stack commented on HBASE-10504:
---

No movement. Moving out. Talked to [~whoschek] offline. It doesn't look like he 
has time to help out here.

> Define Replication Interface
> 
>
> Key: HBASE-10504
> URL: https://issues.apache.org/jira/browse/HBASE-10504
> Project: HBase
>  Issue Type: Task
>  Components: Replication
>Reporter: stack
>Assignee: Enis Soztutar
>Priority: Blocker
> Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch
>
>
> HBase has replication.  Fellas have been hijacking the replication apis to do 
> all kinds of perverse stuff like indexing hbase content (hbase-indexer 
> https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
> overrides that replicate via an alternate channel (over a secure thrift 
> channel between dcs over on HBASE-9360).  This issue is about surfacing these 
> APIs as public with guarantees to downstreamers similar to those we have on 
> our public client-facing APIs (and so we don't break them for downstreamers).
> Any input [~phunt] or [~gabriel.reid] or [~toffer]?
> Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2017-08-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137649#comment-16137649
 ] 

stack commented on HBASE-10504:
---

What we going to do here? ReplicationEndpoint is broken in hbase-2.0 so if we 
want to mess with Replication Interfaces, now is the time.

> Define Replication Interface
> 
>
> Key: HBASE-10504
> URL: https://issues.apache.org/jira/browse/HBASE-10504
> Project: HBase
>  Issue Type: Task
>  Components: Replication
>Reporter: stack
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 2.0.0-alpha-3
>
> Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch
>
>
> HBase has replication.  Fellas have been hijacking the replication apis to do 
> all kinds of perverse stuff like indexing hbase content (hbase-indexer 
> https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
> overrides that replicate via an alternate channel (over a secure thrift 
> channel between dcs over on HBASE-9360).  This issue is about surfacing these 
> APIs as public with guarantees to downstreamers similar to those we have on 
> our public client-facing APIs (and so we don't break them for downstreamers).
> Any input [~phunt] or [~gabriel.reid] or [~toffer]?
> Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-09-08 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126573#comment-14126573
 ] 

Enis Soztutar commented on HBASE-10504:
---

Unscheduling this for 1.x series. Put it back if you want to work on it. I 
don't think we will have another kind of process for providing isolation to 
replication endpoints as discussed above. Otherwise HBASE-11367 provides the 
pluggable endpoint similar to our current co-processors. 

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 2.0.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-16 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033203#comment-14033203
 ] 

Enis Soztutar commented on HBASE-10504:
---

bq. As long as the RE daemons are disjoint from the region servers that would 
work, since you need to be able to have as many/few of them as you want, plus 
to be able to have them on different machines.
bq. I could even see this as being useful for certain use cases that want 
inter-HBase replication to be done through specific gateways.
Sounds good to me. I think it is ok to maintain such a gateway inside hbase 
proper. 

[~jdcryans] mind giving the patch a review? Do you want to do this in a 
subtask? 

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033216#comment-14033216
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


bq.  Do you want to do this in a subtask?

Yes, please.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-16 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033267#comment-14033267
 ] 

Enis Soztutar commented on HBASE-10504:
---

Ok, moved the patch to subtask HBASE-11367. Thanks JD. 

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029275#comment-14029275
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


[~enis], as per the discussion above, I think what you are proposing is good 
but it doesn't solve the problem that this jira is about. How about we open a 
new jira, titled something like Make ReplicationSource extendable, and do the 
review/commit over there?

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-12 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029534#comment-14029534
 ] 

Enis Soztutar commented on HBASE-10504:
---

bq. as per the discussion above, I think what you are proposing is good but it 
doesn't solve the problem that this jira is about. 
My understanding of the issue description and discussions is that the new 
interface ReplicationEndpoint *is* the supported replication interface as per 
the title. What do you have in mind otherwise? We should expose any of our RPC 
or zookeeper details to replication plugins. 

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-12 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029791#comment-14029791
 ] 

Enis Soztutar commented on HBASE-10504:
---

Thinking a bit more about this, if we want to provide isolation for the 
replication services in hbase, we can have a simple host as another daemon 
which hosts the ReplicationEndpoint implementation. RS's will use a built-in RE 
to send the edits to this layer, and the host will delegate it to the RE 
implementation. 
The flow would be something like: 
RS -- RE inside RS -- Host daemon for RE -- Actual RE implementation -- 
third party system
But I am not sure whether we will have this daemon host layer implemented by 
1.0. 

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029806#comment-14029806
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


As long as the RE daemons are disjoint from the region servers that would work, 
since you need to be able to have as many/few of them as you want, plus to be 
able to have them on different machines.

I could even see this as being useful for certain use cases that want 
inter-HBase replication to be done through specific gateways.

We then have to architect a way to discover those endpoints, etc... kind of 
what the current replication code already does with sinks.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028779#comment-14028779
 ] 

Hadoop QA commented on HBASE-10504:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12649979/hbase-10504_v1.patch
  against trunk revision .
  ATTACHMENT ID: 12649979

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 7 
warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+this.replicationPeers.addPeer(id, new 
ReplicationPeerConfig().setClusterKey(clusterKey), tableCFs);
+  public ReplicationPeerZKImpl(Configuration conf, String id, 
ReplicationPeerConfig peerConfig) throws ReplicationException {
+  void addPeer(String peerId, ReplicationPeerConfig peerConfig, String 
tableCFs) throws ReplicationException;
+  public void addPeer(String id, ReplicationPeerConfig peerConfig, String 
tableCFs) throws ReplicationException {
+  LOG.warn(Failed to get replication peer configuration of 
clusterid= + id +  znode content, continuing.);
+  private static ReplicationPeerConfig parsePeerFrom(final byte[] bytes) 
throws DeserializationException {
+  new java.lang.String[] { Clusterkey, 
ReplicationEndpointImpl, Data, Configuration, });
+ * class rather than implementing {@link ReplicationEndpoint} directly for 
better backwards compatibility.
+public abstract class HBaseReplicationEndpoint extends BaseReplicationEndpoint 
implements Abortable {
+// send the edits to the endpoint. Will block until the edits are 
actually sent and acknowledged

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9751//console

This message is automatically generated.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).

[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-16 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999301#comment-13999301
 ] 

Enis Soztutar commented on HBASE-10504:
---

bq. This seems to me like work that is difficult to justify for the SEP because 
it makes use of existing HBase RPC and replication failover handling, and 
currently works
Of course RPC and RS discovery etc are internal to HBase. The main advantage 
would be that you will be protected against changes there. 
bq. Obviously these comments are purely from the standpoint of SEP maintenance. 
As I mentioned in a previous comment, I think that this proposal would be a 
great addition to HBase and very useful – I just don't see it's usefulness to 
the SEP yet.
Fair enough. HBase's own inner workings for inter-cluster replication is not 
changing with this patch, so you should be able to have SEP as it is. I think I 
don't fully understand why would you want to have a proxy layer as a separate 
service (other than isolation issues).  

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-16 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998525#comment-13998525
 ] 

Gabriel Reid commented on HBASE-10504:
--

{quote}Even in that case, the server-client implementation will be much simpler 
than what you have in SEP I think, which does hbase RPC + zookeeper + zk 
mockup. Only a server discovery + simple one method thrift / PB RPC is what is 
needed from my understanding. It would be great to have a skeleton for this 
living in HBase proper as well.{quote}

The SEP does indeed make use of HBase RPC and ZK, and does masquerades in ZK as 
a region server, but I don't see that much of an advantage of doing a separate 
thrift or PB-based RPC. I think that this would just be exchanging one existing 
RPC mechanism (hbase) for another (thrift/PB). The service discovery would 
still be required, which would also involve ZK. Not having to do the setup of 
masquerading as a regionserver would indeed be a plus.

Taking this road for the SEP would then require:
* defining/setting up RPC (which is currently already provided by hbase
* implementing a ReplicationConsumer (and deploying and configuring it in hbase 
RS) that does service discovery, RPC endpoint selection, and properly deals 
with failing/flaky RPC endpoints (also already provided in hbase replication)
* reworking the SEP RPC server to use the new RPC mechanism

This seems to me like work that is difficult to justify for the SEP because it 
makes use of existing HBase RPC and replication failover handling, and 
currently works. I may very well be missing something, but I don't see how 
reimplementing this stuff in the SEP would simplify things very much.

The one big potential advantage that I do see is that it would protect the SEP 
from changes in the internal HBase replication interfaces.

Obviously these comments are purely from the standpoint of SEP maintenance. As 
I mentioned in a previous comment, I think that this proposal would be a great 
addition to HBase and very useful -- I just don't see it's usefulness to the 
SEP yet.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-16 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999804#comment-13999804
 ] 

Gabriel Reid commented on HBASE-10504:
--

{quote}I think I don't fully understand why would you want to have a proxy 
layer as a separate service (other than isolation issues).{quote}

Process isolation and deployment isolation are indeed the two main issues. The 
process isolation side of things probably speaks for itself. Deployment 
isolation is another issue however. Keeping dependencies of hbase-indexer in 
line with all deps in hbase is an issue (as all hbase-indexer jars would need 
to be deployed into the HBase lib for this approach to work), and all 
configuration, config changes, and restarting would also become directly linked 
to HBase.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-16 Thread wolfgang hoschek (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000121#comment-14000121
 ] 

wolfgang hoschek commented on HBASE-10504:
--

+1 to what [~greid] already said. 

To help illustrate some of the benefits of process isolation take a look at the 
large and complex dependency tree of hbase-indexer below. I can't imagine 
having something like this run inside of the primary region server process 
without causing ongoing stability, scalability, reliability, flexibility and 
resource management issues for HBase.

In addition, being able to decouple the lifecycle of the hbase-indexer 
processes and their configuration from the primary region server processes 
further contributes to reliability and flexibility and ease of management. It 
also means we can scale out the Indexer tier more independently from the HBase 
tier without violating resource management SLAs. For example, indexing can suck 
quite some CPU.

When this JIRA was originally filed we were hoping for it to help ease some of 
the hbase-indexer pain, but at the moment it looks like it's going in a 
different direction, a direction that doesn't apply to our use case even though 
it may well be great for other use cases, and, at least from our perspective, 
I'm afraid it would be more like a step backwards.

Thoughts?

{code}
[INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ 
hbase-indexer-server ---
[INFO] com.ngdata:hbase-indexer-server:jar:1.5-SNAPSHOT
[INFO] +- com.ngdata:hbase-indexer-engine:jar:1.5-SNAPSHOT:compile
[INFO] |  +- com.ngdata:hbase-indexer-common:jar:1.5-SNAPSHOT:compile
[INFO] |  |  +- com.ngdata:hbase-sep-api:jar:1.5-SNAPSHOT:compile
[INFO] |  |  +- com.ngdata:hbase-sep-impl:jar:1.5-hbase0.98-SNAPSHOT:compile
[INFO] |  |  |  \- com.ngdata:hbase-sep-impl-common:jar:1.5-SNAPSHOT:compile
[INFO] |  |  \- org.apache.solr:solr:war:4.4.0:compile
[INFO] |  +- com.ngdata:hbase-indexer-model:jar:1.5-SNAPSHOT:compile
[INFO] |  |  \- net.iharder:base64:jar:2.3.8:compile
[INFO] |  +- org.apache.solr:solr-core:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-core:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-codecs:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-analyzers-common:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-analyzers-kuromoji:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-analyzers-phonetic:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-highlighter:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-memory:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-misc:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-queryparser:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-spatial:jar:4.4.0:compile
[INFO] |  |  |  \- com.spatial4j:spatial4j:jar:0.3:compile
[INFO] |  |  +- org.apache.lucene:lucene-suggest:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-grouping:jar:4.4.0:compile
[INFO] |  |  +- org.apache.lucene:lucene-queries:jar:4.4.0:compile
[INFO] |  |  +- 
com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.2:compile
[INFO] |  |  +- commons-fileupload:commons-fileupload:jar:1.2.1:compile
[INFO] |  |  +- joda-time:joda-time:jar:1.6:compile (version managed from 2.2)
[INFO] |  |  +- org.restlet.jee:org.restlet:jar:2.1.1:compile
[INFO] |  |  +- org.restlet.jee:org.restlet.ext.servlet:jar:2.1.1:compile
[INFO] |  |  +- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime
[INFO] |  |  +- org.apache.httpcomponents:httpclient:jar:4.2.5:compile
[INFO] |  |  |  \- org.apache.httpcomponents:httpcore:jar:4.2.5:compile 
(version managed from 4.2.4)
[INFO] |  |  \- org.apache.httpcomponents:httpmime:jar:4.2.5:compile
[INFO] |  +- org.apache.solr:solr-solrj:jar:4.4.0:compile
[INFO] |  |  \- org.noggit:noggit:jar:0.5:compile
[INFO] |  +- org.apache.solr:solr-cell:jar:4.4.0:compile
[INFO] |  |  \- org.apache.tika:tika-parsers:jar:1.4:compile
[INFO] |  | +- org.apache.tika:tika-core:jar:1.4:compile
[INFO] |  | +- org.gagravarr:vorbis-java-tika:jar:0.1:compile
[INFO] |  | +- edu.ucar:netcdf:jar:4.2-min:compile
[INFO] |  | +- org.apache.james:apache-mime4j-core:jar:0.7.2:compile
[INFO] |  | +- org.apache.james:apache-mime4j-dom:jar:0.7.2:compile
[INFO] |  | +- org.apache.pdfbox:pdfbox:jar:1.8.1:compile
[INFO] |  | |  +- org.apache.pdfbox:fontbox:jar:1.8.1:compile
[INFO] |  | |  \- org.apache.pdfbox:jempbox:jar:1.8.1:compile
[INFO] |  | +- org.bouncycastle:bcmail-jdk15:jar:1.45:compile
[INFO] |  | +- org.bouncycastle:bcprov-jdk15:jar:1.45:compile
[INFO] |  | +- org.apache.poi:poi:jar:3.9:compile
[INFO] |  | +- org.apache.poi:poi-scratchpad:jar:3.9:compile
[INFO] |  | +- org.apache.poi:poi-ooxml:jar:3.9:compile
[INFO] |  | |  +- org.apache.poi:poi-ooxml-schemas:jar:3.9:compile
[INFO] |  | |  |  \- 

[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-15 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997936#comment-13997936
 ] 

Gabriel Reid commented on HBASE-10504:
--

[~enis] [~jdcryans] I think that this would make a great addition to HBase in 
general -- if I'm reading it correctly, it basically works out as a kind of 
RegionObserver.postPut that runs in process, but off of the write path. 
However, I'm less sure of how well this general approach would work for 
hbase-indexer.

Although this approach would simplify a lot of things (e.g.removing the need to 
to act as a slave cluster), it also introduces a number of important changes in 
the execution model of something that runs via the SepConsumer (i.e. reacting 
to HBase updates)
* the number of replication endpoints becomes directly linked to the number of 
regionservers -- as far as I see, it would no longer be possible to have more 
or fewer replication endpoints than regionservers
* the replication (ReplicationConsumer) runs within the same JVM as the 
regionserver, which brings some important changes: all of the dependencies then 
need to be present in the classpath of the regionserver (and compatible with 
the dependencies of the regionserver), it has an effect on the heap, is no 
longer possible to tune the JVM settings for the replication process on its own
* from what I see of the current patch (although I'm sure this could be 
changed) is that it looks like there's only a single ReplicationConsumer, 
meaning that it wouldn't be possible to have real replication running next to 
a custom ReplicationSource

The tighter-coupling issues that I outlined above could be worked around by 
having a lightweight ReplicationConsumer that just passes data to an external 
process via RPC, but that's basically exactly what the replication code in 
HBase is doing right now, so it would be a shame to re-implement something like 
that.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-15 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998272#comment-13998272
 ] 

Enis Soztutar commented on HBASE-10504:
---

bq. the number of replication endpoints becomes directly linked to the number 
of regionservers – as far as I see, it would no longer be possible to have more 
or fewer replication endpoints than regionservers
As you said, if you want to still have a proxy layer to have a different 
configuration then you can implement your own server + client layer and plug 
only the client in. Run the servers in whatever conf / jvm you want.
bq. the replication (ReplicationConsumer) runs within the same JVM as the 
regionserver, which brings some important changes: all of the dependencies then 
need to be present in the classpath of the regionserver (and compatible with 
the dependencies of the regionserver), it has an effect on the heap, is no 
longer possible to tune the JVM settings for the replication process on its own
Same as above, you can have a thin client layer and tick server layer on your 
own if you prefer. Otherwise, this will simplify deployments a lot. 
bq. from what I see of the current patch (although I'm sure this could be 
changed) is that it looks like there's only a single ReplicationConsumer, 
meaning that it wouldn't be possible to have real replication running next to 
a custom ReplicationSource
It is a single ReplicationConsumer per peer. HBaseClusterReplicator implements 
this interface. You can have one peer definition replicating to a different 
cluster and a peer replicating to a third party. 
bq. The tighter-coupling issues that I outlined above could be worked around by 
having a lightweight ReplicationConsumer that just passes data to an external 
process via RPC, but that's basically exactly what the replication code in 
HBase is doing right now, so it would be a shame to re-implement something like 
that.
Even in that case, the server-client implementation will be much simpler than 
what you have in SEP I think, which does hbase RPC + zookeeper + zk mockup. 
Only a server discovery + simple one method thrift / PB RPC is what is needed 
from my understanding. It would be great to have a skeleton for this living in 
HBase proper as well. 

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997727#comment-13997727
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


[~enis], sounds like a plan. Oh and something we should not forget is setting 
the right InterfaceAudience. 

[~gabriel.reid] you down with this? It means that this class (1) will need a 
few changes just to continue working like it currently does, and eventually a 
lot more will need to change to use the replication consumer but it should 
simplify the code a lot. It will also have an impact on the deployment since 
you won't need a fake slave cluster anymore.

1. 
https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl-0.98/src/main/java/com/ngdata/sep/impl/SepReplicationSource.java

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-13 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996979#comment-13996979
 ] 

Enis Soztutar commented on HBASE-10504:
---

bq. So IIUC you are going a completely different way than what was discussed 
before, right? I don't mind it but I want to make sure that we're on the same 
track.
Indeed. If you want to replicate to SOLR for example, no need to add another 
proxy layer. You can have SOLR clients do the transformations, and send the 
updates directly. Having to mock a cluster with pluggable sink approach 
requires to implement some hbase-specific implementation details (zookeeper 
interactions, meta, rpc layer etc). 
bq. Might still be nice to offer some tools like removeNonReplicableEdits()
Agreed. I was thinking of doing a Filter interface for that. You can either 
pass a filter, or you can implement your own filter and invoke it. I have to 
dig a bit more for that. 
bq. the basic metrics the handling of a disabled peer, etc, for the other 
consumers.
In the POC patch, I keep the peer definition, but peers can have pluggable 
ReplicationConsumer. You can still enable/disable a peer, get the metrics for 
the peer coming from the ReplicationSource for that peer. We should allow the 
replicationConsumer to put up its own metrics with the peer name as well.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995685#comment-13995685
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


[~enis] So IIUC you are going a completely different way than what was 
discussed before, right? I don't mind it but I want to make sure that we're on 
the same track.

So instead of having people define their sinks, they instead put that code in 
the ReplicationConsumer. Seems like a good idea, you don't have to mess around 
getting a fake cluster up and running, among other issues. Might still be nice 
to offer some tools like removeNonReplicableEdits(), the basic metrics the 
handling of a disabled peer, etc, for the other consumers.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-10 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992995#comment-13992995
 ] 

Enis Soztutar commented on HBASE-10504:
---

I have a very basic prototype patch that we can consider as a direction for 
this. The idea is to break up ReplicationSource into two parts, the part that 
tails the WAL and listens to the replication queue, and the part that talks to 
the ReplicationSink in the peer cluster to ship the edits. The new interface 
ReplicationConsumer (we can change the names) is the plugin point. 

There is still one ReplicationSource per peer in every RS. The 
ReplicationSource tails the logs for the peer, and holds a ReplicationConsumer 
pointer. It calls ReplicationConsumer.shipEdits() method for actual shipping 
logic. A default HBaseClusterReplicator implements ReplicationConsumer and 
talks to the ReplicationSink on the other cluster etc. 

For each peer, there is now a class implementing ReplicationConsumer, and 
possibly some configuration. This allows one to implement a ReplicationConsumer 
for SOLR (or any other data store) and directly plug it in to the RS without 
having to do a proxy layer and mocking zookeeper / RS RPC etc. 

Let me know what you think. Early feedback will help shape up the patch. 





 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-04-08 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963281#comment-13963281
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


bq. It doesn't have to be 'our' RPC, right JD?

Right, I guess I meant RPC protocol instead of mechanism.

bq. You just have to write the znode data in the same format that hbase expects 
it to be in

You're right, so we just have to make it explicit. And also you need to be 
authenticated if using security.

bq. The above statement makes it appear as though RPC protocol and zk 
serialization are more brittle than they actually are.

Sorry, I was referring to the code that I had just linked to where they use our 
internal APIs like ServerName and ZKUtil. Considering your two previous 
comments, that part of my writeup is irrelevant.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-04-07 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962351#comment-13962351
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


It does seem that people want to be able to listen to row operations in HBase. 
I'm not sure how we can fully support this use case if bulk loads and edits 
that aren't hitting the WALs are mixed in. The current contract is that we 
replicate everything that's sent to the WAL and that got sync'd.

Regarding the actual interfaces, here's how I see it (I'm sure I'm missing a 
few things):

h5. Replication source

h6. Filtering WALEdits
We need to formalize what {{ReplicationSource#removeNonReplicableEdits}} is 
currently doing, maybe it could be done as a chain à la {{FileCleanerDelegate}}.

h6. Replication management
Currently, enable/disabling/adding/removing peers is all done via ZK which 
we're trying to not use as a permanent data store. This jira goes into more 
details [HBASE-10295|https://issues.apache.org/jira/browse/HBASE-10295]. If the 
master is going to be in charge of it, then it means we need to define a new 
protobuf service that the RS will implement. It should be separate from 
AdminProtos. 

h5. Replication sink

h6. Re-creating a region server
Replication is currently done via our RPC mechanism, so you need to start 
{{RpcServer}} in order to receive requests. The the next part of the contract 
that replication relies on is that the sinks are discoverable via ZooKeeper, 
basically piggybacking on the RS discovery process. This means setting up a 
{{ZooKeeperWatcher}}, crafting a server name and then creating the znode. A 
good example of this can be found in SEP: 
https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl-0.95/src/main/java/com/ngdata/sep/impl/SepConsumer.java

It may not seem as whole lot of code but it's code that can easily be broken 
with a few signature changes since those interfaces aren't clearly marked.

h6. ReplicateWALEntry
{{ReplicateWALEntry}} is a service offered as part of AdminProtos so it needs 
to move out. It should be a separate service from the previous one I described 
in Replication management. The unfortunate thing here is that {{Replay}} 
relies on the same messages: 
https://github.com/apache/hbase/blob/trunk/hbase-protocol/src/main/protobuf/Admin.proto#L260.
 To extract {{ReplicateWALEntry}} in a compatible way we'll have to deprecate 
it and maybe also deprecate {{Replay}}'s current signature to give it its own 
appropriately-named messages (or not, not a big deal).

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-04-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962579#comment-13962579
 ] 

stack commented on HBASE-10504:
---

Just a few comments...

bq. ...via our RPC mechanism...

It doesn't have to be 'our' RPC, right JD?  You don't have to have the hbase 
RPC classes on the implementors side.  An RPC just needs to implement the 
protocol described here http://hbase.apache.org/book.html#d248e15581  It can be 
'your' RPC --whether an instance of the hbase classes instantiated and set 
running or a python implementation of the protocol... -- if you want, right?

bq.  This means setting up a ZooKeeperWatcher, crafting a server name and 
then creating the znode. 

Similar, we don't care how you write the znode.  It doesn't have to be our 
wacky ZKW that does the writing.  You just have to write the znode data in the 
same format that hbase expects it to be in (We could do folks a favor and 
describe it in the Interface).

bq. It may not seem as whole lot of code but it's code that can easily be 
broken with a few signature changes since those interfaces aren't clearly 
marked.

We can't change the RPC protocol, not in an incompatible way.  We'll undo being 
able to have old and new client and servers communicate.  Ditto for how we 
serialize znodes.  The above statement makes it appear as though RPC protocol 
and zk serialization are more brittle than they actually are.

Otherwise, good stuff.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-21 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13908678#comment-13908678
 ] 

Enis Soztutar commented on HBASE-10504:
---

bq. Making blocker so we deal w/ it now.
Agreed. 

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-14 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902005#comment-13902005
 ] 

Gabriel Reid commented on HBASE-10504:
--

Making the NRT listening of changes on HBase into a first-class API as 
[~toffer] suggested sounds like an interesting idea, if there's interest in 
going that far in actively supporting this use case. 

If there is interest in doing that, the current hbase-sep library (which is the 
current library used for latching in via replication and responding to changes 
for the hbase-indexer project) could be a possible starting point. The current 
API interfaces[1] as well as the impl are within sub-modules[2] of the 
hbase-indexer project on GitHub.

[1] 
https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep/hbase-sep-api/src/main/java/com/ngdata/sep
[2] https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-13 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900879#comment-13900879
 ] 

Jonathan Hsieh commented on HBASE-10504:


Let's mark the apis used as @InterfaceAudience.LimitedPrivate(replication) 
(or something like that) so that reviewers and mods start taking compat 
considerations in mind.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-12 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898904#comment-13898904
 ] 

Gabriel Reid commented on HBASE-10504:
--

I would echo [~phunt]'s points -- having these interfaces public and stable 
would be great.

I think that from the point of view of the hbase-indexer project, the main 
contact points that are involved are 
o.a.h.h.replication.regionserver.ReplicationSource and o.a.h.h.Server, as well 
as the generated AdminProtos, although I assume that the scope of this ticket 
will be a fair bit wider than that.

If we're going to move to having a published stable API, now is also probably 
the time to make any changes that may be needed. The one thing I can think of 
right now that would be nice is a better hook into 
o.a.h.h.replication.regionserver.ReplicationSource#removeNonReplicableEdits. 
Right now we override that method to do additional filtering on HLog entries 
that we know we don't want do send to the indexer, but it would be good to make 
that into a public method with some kind of filter interface.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-12 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899573#comment-13899573
 ] 

Francis Liu commented on HBASE-10504:
-

+1 this is sorely needed. In our case we needed to add changes to 
ReplicationSourceManager, ReplicationSource, ReplicationSink and 
ReplicationZookeeper(?). I'll try to put up the changes we had to do.

For the use case of NRT listening on HBase changes. Maybe we should consider 
making it a first class API instead of looking at replication interface changes 
to support it. That way WALEdit would still be private-ish?

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898504#comment-13898504
 ] 

stack commented on HBASE-10504:
---

Is HBASE-9507 related?

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-11 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898507#comment-13898507
 ] 

Nick Dimiduk commented on HBASE-10504:
--

bq. Is HBASE-9507 related?

I think so, yes, but that's based on code study and catching changes along the 
0.94 line that broke the above tool.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-11 Thread Patrick Hunt (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898543#comment-13898543
 ] 

Patrick Hunt commented on HBASE-10504:
--

My 0.02. Being able to receive notifications in NRT when things change in HBase 
is useful. Having a stable API for this would be great. The hbase-indexer 
service that [~stack] linked to in the description does indeed use the 
replication API to get notifications and update a Solr/Lucene based index, this 
provides an NRT secondary index(es) of sorts on HBase. In our case (indexer) we 
are using the interface to be notified of changes in real time, which then 
update the index immediately. This has worked extremely well (reliable, secure, 
allows filtering, etc...), however we have hit issues with the fact that the 
interface is private. Each release tends to have some changes to this interface 
(or it's supporting code) that impacts the indexer. We get around this by 
updating the code, but it's non-optimal. It would be great if this interface 
could be made public/stable.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)