On Sat, May 28, 2016 at 9:23 PM, Lex Toumbourou <[email protected]> wrote:

> Hi all,
>
> I'm trying to run a large CopyTable job between clusters in totally
> different datacenters and I'm trying to determine what network connectivity
> is required here.
>
> As per the Cloudera blog post about Copytable, I understand that the
> network should be such that "MR TaskTrackers can access all the HBase and
> ZK nodes in the destination cluster." So in practise that means that the
> source task trackers should be able to access:
>
> *  Zookeeper on port 2181
> * the Master on its RPC port (16000)
> * the Regions' on their RPC ports (16020)
>
>
You'd have access to the UIs?


> Anything else I need to configure here? Does Hadoop on the source need to
> talk to directly with the destination Hadoop etc?
>
>
Looking at code, it looks like it is just the source MR task doing bulk
mutations against remote cluster.



> Also, what's unclear to me is what I should be doing with DNS. I'm guessing
> that the source cluster needs to be able to resolve the hostnames of remote
> RegionServers and Master nodes as stored in Zookeeper. Anything else I need
> to configure here?
>
>
Yeah. Source HBase client is doing puts against remote cluster so that
means being able to read the remote metatable and then being able to
address whatever regionserver it finds there from the destination cluster.

St.Ack



> Thanks for your time!
>
> --
> Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>
>

Reply via email to