Really depends on your requirements for the format of the data.

The easiest way I can think of is to "stream" batches of data into a pub
sub system that the target system can access and then consume.

Verify each batch and then ditch them.

You can throttle the size of the intermediary infrastructure based on your
batches.

Seems the most efficient approach.

On Thursday, June 18, 2015, Divya Gehlot <[email protected]> wrote:

> Hi,
> I need to copy data from first hadoop cluster to second hadoop cluster.
> I cant access second hadoop cluster from first hadoop cluster due to some
> security issue.
> Can any point me how can I do apart from distcp command.
> For instance
> Cluster 1 secured zone -> copy hdfs data  to -> cluster 2 in non secured
> zone
>
>
>
> Thanks,
> Divya
>
>
>

Reply via email to