Hi Juan,

In addition to Binglin Chang's reply. When you either snapshot or manually copy 
the data you need to understand a little bit about how hive works to be able to 
do a correct restore.
Hive keeps metadata in a separate database. So for example if you have a table 
with a date partition it will use the metadata to know which partitions exist. 
So for example you have these partitions on hdfs:
/user/hive/warehouse/table/logindate=2013-11-26
/user/hive/warehouse/table/logindate=2013-11-27
/user/hive/warehouse/table/logindate=2013-11-28

If you drop parition "2013-11-27" it will also remove the metadata reference. 
So if you restore the data the partition will exist on hdfs but you still need 
to do some "add partition" commands before hive will know the partition exists.
It's usually a good idea to snapshot the metadata at the same time you snapshot 
the hdfs data so you get one consistent view which you can trust to be correct.

Bennie.

From: Binglin Chang [mailto:[email protected]]
Sent: Thursday, November 28, 2013 4:27 PM
To: [email protected]
Subject: Re: HDFS snapshots restore

snapshot restore feature is not implemented yet. Currently you can use distcp 
to copy snapshot dir to your new cluster, suppose your hive dir is /user/hive/, 
snapshot dir is /user/hive/.snapshot/sn0, you can:
 distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0 
hdfs://newcluster:8020/somedir


On Thu, Nov 28, 2013 at 9:47 PM, Juan Martin Pampliega 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I have read the documentation about HDFS snapshots for hadoop 2 
(http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html)
 but it is still not clear how do I use this snapshots to restore the data.

Lets say I have a directory with the data corresponding to a Hive table that I 
want to backup. I take a snapshot today and tomorrow I find out that the 
modifications done to the table/directory after the snapshot are wrong and I 
want to revert the directory to the snapshot state. How do I achieve this?

Also, can I extract the snapshot from HDFS and save it in an external storage 
and later use it to restore this directory in a new empty cluster? or which is 
the recommended way to do this?


Thanks,
Juan.

Reply via email to