Hi all, I'm trying to run a large CopyTable job between clusters in totally different datacenters and I'm trying to determine what network connectivity is required here.
As per the Cloudera blog post about Copytable, I understand that the network should be such that "MR TaskTrackers can access all the HBase and ZK nodes in the destination cluster." So in practise that means that the source task trackers should be able to access: * Zookeeper on port 2181 * the Master on its RPC port (16000) * the Regions' on their RPC ports (16020) Anything else I need to configure here? Does Hadoop on the source need to talk to directly with the destination Hadoop etc? Also, what's unclear to me is what I should be doing with DNS. I'm guessing that the source cluster needs to be able to resolve the hostnames of remote RegionServers and Master nodes as stored in Zookeeper. Anything else I need to configure here? Thanks for your time! -- Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>
