Hi Michael, The reason is that cluster B is a production environment with jobs running on it non-stop. I do not want to take ressources away from it. Secondly, the "destination" cluster A is a much less powerful test environment, thus, even when running the job on B - the slow HBase sink on cluster A would be a bottleneck.
What I did in the end was run a regular job on cluster A with input path set to a file on cluster B. /David On Mon, Mar 25, 2013 at 5:12 PM, Michael Segel <[email protected]>wrote: > Just out of curiosity... > > Why do you want to run the job on Cluster A that reads from Cluster B but > writes to Cluster A? > > Wouldn't it be easier to run the job on Cluster B and inside the > Mapper.setup() you create your own configuration for your second cluster > for output? > > > On Mar 24, 2013, at 7:49 AM, David Koch <[email protected]> wrote: > > > Hello J-D, > > > > Thanks, it was instructive to look at the source. However, I am now stuck > > with getting HBase to honor the "hbase.mapred.output.quorum" setting. I > > opened a separate topic for this. > > > > Regards, > > > > /David > > > > > > On Mon, Mar 18, 2013 at 11:26 PM, Jean-Daniel Cryans < > [email protected]>wrote: > > > >> Checkout how CopyTable does it: > >> > >> > https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/CopyTable.java > >> > >> J-D > >> > >> On Mon, Mar 18, 2013 at 3:09 PM, David Koch <[email protected]> > wrote: > >>> Hello, > >>> > >>> Is it possible to run a M/R on cluster A over a table that resides on > >>> cluster B with output to a table on cluster A? If so, how? > >>> > >>> I am interested in doing this for the purpose of copying part of a > table > >>> from B to A. Cluster B is a production environment, cluster A is a slow > >>> test platform. I do not want the M/R to run on B since it would block > >>> precious slots on this cluster. Otherwise I could just run CopyTable on > >>> cluster B and specify cluster A as output quorum. > >>> > >>> Could this work by pointing the client configuration at the > >> mapred-site.xml > >>> of cluster A and the hdfs-site.xml and hbase-site.xml of cluster B? In > >> this > >>> scenario - in order to output to cluster A I guess I'd have to set > >>> TableOutputFormat.QUORUM_ADDRESS to cluster A. > >>> > >>> I use a client configuration generated by CDH4 and there are some other > >>> files floating around - such as core-site.xml, not sure what to do with > >>> that. > >>> > >>> Thank you, > >>> > >>> /David > >> > >
