Great. Thank you, St.Ack. On 30 May 2016 at 08:57, Stack <[email protected]> wrote:
> On Sat, May 28, 2016 at 9:23 PM, Lex Toumbourou <[email protected]> wrote: > > > Hi all, > > > > I'm trying to run a large CopyTable job between clusters in totally > > different datacenters and I'm trying to determine what network > connectivity > > is required here. > > > > As per the Cloudera blog post about Copytable, I understand that the > > network should be such that "MR TaskTrackers can access all the HBase and > > ZK nodes in the destination cluster." So in practise that means that the > > source task trackers should be able to access: > > > > * Zookeeper on port 2181 > > * the Master on its RPC port (16000) > > * the Regions' on their RPC ports (16020) > > > > > You'd have access to the UIs? > > > > Anything else I need to configure here? Does Hadoop on the source need to > > talk to directly with the destination Hadoop etc? > > > > > Looking at code, it looks like it is just the source MR task doing bulk > mutations against remote cluster. > > > > > Also, what's unclear to me is what I should be doing with DNS. I'm > guessing > > that the source cluster needs to be able to resolve the hostnames of > remote > > RegionServers and Master nodes as stored in Zookeeper. Anything else I > need > > to configure here? > > > > > Yeah. Source HBase client is doing puts against remote cluster so that > means being able to read the remote metatable and then being able to > address whatever regionserver it finds there from the destination cluster. > > St.Ack > > > > > Thanks for your time! > > > > -- > > Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/> > > > -- Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>
