Hello Alain, Thank you for your answer.
Basically having a tool to check all sstables in a folder using the checksum would be nice. But finally I can have the same result using some shasum tool. The goal is to verify integrity of files copied back from an external backup tool. The question came because their backup system corrupted some file in the past and they think with their current backup process in mind. I will insist on the snapshot on truncate that already saved me, and that other checks should be done by the backup tool if any is used. Cheers, -- Jérôme Mainaud jer...@mainaud.com 2017-01-12 14:05 GMT+01:00 Alain RODRIGUEZ <arodr...@gmail.com>: > Hi Jérôme, > > About this concern: > > But my Op retains my arm and asks: "Are you sure that the snapshot is safe >> and will be restored before truncating data we have?" > > > Make sure to enable snapshot on truncate (cassandra.yaml) or do it > manually. This way if the restored dataset is worst than the current one > (the one you plan to truncate), you can always rollback this truncate / > restore action. This way you can tell your "Op" that this is perfectly safe > anyway, no data would be lost, even in the worst case scenario (not > considering the downtime that would be induced). Plus this snapshot is > cheap (hard links) and do not need to be moved around or kept once you are > sure the old backup fits your need. > > Truncate is definitely the way to go before restoring a backup. Parsing > the data to delete it all is not really an option imho. > > Then about the technical question "how to know that a snapshot is clean" > it would be good to define "clean". You can make sure the backup is > readable, consistent enough and correspond to what you want by inserting > all the sstables into a testing cluster and performing some reads there > before doing it in production. You can use for example AWS EC2 machines > with big EBS attached or whatever and use the sstableloader to load data > into it. > > If you are just worried about SSTables format validity, there is no tool I > am aware of to check sstables well formatted but it might exist or be > doable. An other option might be to do a checksum on each sstable before > uploading it elsewhere and make sure it matches when downloaded back. > That's the first things that come to my mind. > > Hope that is helpful. Hopefully, someone else will be able to point you to > an existing tool to do this work. > > Cheers, > ----------------------- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > 2017-01-12 11:33 GMT+01:00 Jérôme Mainaud <jer...@mainaud.com>: > >> Hello, >> >> Is there any tool to test the integrity of a snapshot? >> >> Suppose I have a snapshot based backup stored in an external low cost >> storage system that I want to restore to a database after someone deleted >> important data by mistake. >> >> Before restoring the files, I will truncate the table to remove the >> problematic tombstones. >> >> But my Op retains my arm and asks: "Are you sure that the snapshot is >> safe and will be restored before truncating data we have?" >> >> If this scenario is a theoretical, the question is good. How can I verify >> that a snapshot is clean? >> >> Thank you, >> >> -- >> Jérôme Mainaud >> jer...@mainaud.com >> > >