Re: Hadoop Backup and Archival Cluster

Benoit Perroud Wed, 10 Feb 2016 11:46:02 -0800

We're using Trumpet (http://verisign.github.io/trumpet/), a iNotify-like
for HDFS, as the fondation of such replication inter-cluster replication.
In a nutshell, every new fiels created in Cluster A does notify a
replication system, which copy the file to cluster B (see
https://github.com/verisign/trumpet/blob/master/examples/src/main/java/com/verisign/vscc/hdfs/trumpet/client/example/TestApp.java
for
an example)
For keeping Hive partitions in sync,
https://github.com/daplab/hive-auto-partitioner should make it (also relies
on Trumpet).


Benoit

On Wed, Feb 10, 2016 at 7:37 PM David Whitmore <
[email protected]> wrote:

> Vivek,
>
>
>
> You are correct, distcp will overwrite a file if it has changed or is new.
>
> As to running this realtime (ie: as soon as data is deposited on the
> source cluster, you will have to handle that).
>
> Please be aware if you are talking about hive tables, you will also need
> the hive metastore.
>
> We copy our critical data from a Production Cluster to another Production
> Cluster and to a Test Cluster on a daily basis.
>
> Also, the contents of the Hive Metastore database.
>
> Be aware if you restore the Hive Metastore database on the destination
> cluster, any tables created solely on the destination cluster may disappear.
>
>
>
> David
>
>
>
>
>
> *From:* Vivek Singh Raghuwanshi [mailto:[email protected]]
> *Sent:* Wednesday, February 10, 2016 1:28 PM
> *To:* [email protected]
> *Subject:* Re: Hadoop Backup and Archival Cluster
>
>
>
> Thanks David,
>
>
>
> I want to replicate the data once it reached on the cluster, and delete
> from source Cluster after one year. I want Cluster works as Hot Backup and
> Archival and Cluster A only having latest data.
>
>
>
> And as per my information distcp copy all the data and over-right. Please
> correct me if i am wrong.
>
>
>
>
>
> On Wed, Feb 10, 2016 at 12:21 PM, David Whitmore <
> [email protected]> wrote:
>
> Yes, you can run a distcp to copy data from one cluster to another, also
> distcp has an option to tell if it will delete files on the destination if
> they are NOT on the source.
>
>
>
>
>
> *From:* Vivek Singh Raghuwanshi [mailto:[email protected]]
> *Sent:* Wednesday, February 10, 2016 1:16 PM
> *To:* [email protected]
> *Subject:* Hadoop Backup and Archival Cluster
>
>
>
>
> Hi Friends,
>
>
>
> I am planning to setup a Hadoop Cluster (A) with Cluster replication (B).
> so that once data is reached to Cluster A it will replicated to Cluster D.
> I am having one question if i delete data from Cluster A on the basis of
> Time like one month old data is it also removed from Cluster B. if yes how
> i can avoid this.
>
> What i want to achieve.
>
> 1. Once data is reached to Cluster A it will automatically replicated to
> Cluster B.
>
> 2. After one year old data from Cluster A remove automatically but not
> from Cluster B.
>
> 3. If any one wants to run query on latest data Cluster A is available but
> for Older data Cluster B is available.
>
>
>
>
>
> Regards
>
> --
>
> ViVek Raghuwanshi
> Mobile -+91-09595950504
> Skype - vivek_raghuwanshi
> IRC vivekraghuwanshi
> http://vivekraghuwanshi.wordpress.com/
> http://in.linkedin.com/in/vivekraghuwanshi
>
>
>
>
>
> --
>
> ViVek Raghuwanshi
> Mobile -+91-09595950504
> Skype - vivek_raghuwanshi
> IRC vivekraghuwanshi
> http://vivekraghuwanshi.wordpress.com/
> http://in.linkedin.com/in/vivekraghuwanshi
>

Re: Hadoop Backup and Archival Cluster

Reply via email to