Re: Backups w/ rsync
Goswin von Brederlow wrote: Thanks, should have looked at --link-dest before replying. I wonder how long rsync had that option. I wrote my own rsync script years ago. Maybe it predates this. According to news file, since ~ 2002-9, so quite a bit of time. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backups w/ rsync
Dear Bill, in message [EMAIL PROTECTED] you wrote: Be aware that rsync is useful for making a *copy* of your files, which isn't always the best backup. If the goal is to preserve data and be able to recover in time of disaster, it's probably not optimal, while if you need frequent access to old or deleted files it's fine. If you want to do real backups you should use real tools, like bacula etc. Now you can do an incremental (since last full or incremental) or partial (since last full): touch bkup_incr_new timestamp=$(date +%Y%m%d-%T) find /home -cnewer bkup_incr | cpio -o -Hcrc | gzip -3 /mnt/USBbkup/incr-$timestamp mv -f bkup_incr_new bkup_incr timestamp=$(date +%Y%m%d-%T) find /home -cnewer bkup_full | cpio -o -Hcrc | gzip -3 /mnt/USBbkup/part-$timestamp Now have Johnny Loser downloading some stuff, say: $ wget -N ftp://ftp.kernel.org/pub/linux/kernel/v2.6/linux-2.6.12.tar.gz Are you aware that this file will never be backed up by your script? Also, what about permission / owner changes etc.? A backup tool should never work based on timestamps alone. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED] All he had was nothing, but that was something, and now it had been taken away. - Terry Pratchett, _Sourcery_ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backups w/ rsync
Goswin von Brederlow wrote: I was thinking Michal Soltys ment it this way. You can probably replace the cp invocation with an rsync one but that hardly changes things. I don't think you can do this in a single rsync call. Please correct me if I'm wrong. something along this way: rsync other options --link-dest /backup/2007-01-01/ \ rsync://[EMAIL PROTECTED]/module /backup/2007-01-02/ It will create backup of .../module in ...-02 hardlinking to ...-01 (if possible). So, no need for cp -l. There's similar example in rsync man. Also - multiple --link-dest are supported too. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backups w/ rsync
Please note: I'm having trouble w/gmail's formatting... so please forgive this if it looks horrible. :-| On 9/28/07, Bill Davidsen [EMAIL PROTECTED] wrote: Dean S. Messing wrote: It has been some time since I read the rsync man page. I see that there is (among the bazillion and one switches) a --link-dest=DIR switch which I suppose does what you describe. I'll have to experiment with this and think things through. Thanks, Michal. Be aware that rsync is useful for making a *copy* of your files, which isn't always the best backup. If the goal is to preserve data and be able to recover in time of disaster, it's probably not optimal, while if you need frequent access to old or deleted files it's fine. You are absolutely right when you say it isn't always the best backup. There IS no 'best' backup. For example, full and incremental backup methods such as dump and restore are usually faster to take and restore than a copy, and allow easy incremental backups. If copy meant full data copy and not hard link where possible, I'd agree with you. However... I use a nightly rsync (with --link-dest) to backup more than 40 GiB to a drbd-backed drive. I'll explain why I use drbd in just a moment. Technically, I have a 3 disk raid5 (Linux Software Raid) which is the primary store for the data. Then I have a second drive (non-raid) that is used as a drbd backing store, which I rsync *to* from filesystems built off of the raid. I keep *30 days* of nightly backups on the drbd volume. The average difference between nightly backups is about 45MB, or a bit less than 10%. The total disk usage is (on average) about 10% more than a single backup. On an AMD x86-64 dual core (3600 de-clocked to run at 1GHz) the entire process takes between 1 and 2 minutes, from start to finish. Using hard links means I can snapshot ~175,000 files, about 40GiB, in under 2 minutes - something I'd have a hard time doing with dump+restore. I could easily make incremental or differential copies, and maybe even in that time frame, but I'm not sure I much advantage in that. Furthermore, as you state, dump+restore does *not* include the removal of files which for some scenarios is a huge deal. The long and short of it is this: using hard links (via rsync or cp or whatever) to do snapshot backups can be really, really fast and have significant advantages but there are, as with all things, some downsides. Those downsides are fairly easily mitigated, however. In my case, I can lose 1 drive of the raid and I'm OK. If I lose 2, then the other drive (not part of the raid) has the data I care about. If I lose the entire machine, the *other* machine (the other end of the drbd, only woken up every other day or so) has the data. Going back 30 days. And a bare-metal restore is as fast as your I/O is. I back my /really/ important stuff up on DLT. Thanks again to drbd, when the secondary comes up it communicates with the primary and is able to figure out only which blocks have changed and only copies those. On a nightly basis that is usually a couple of hundred megabytes, and at 12MiB/s that doesn't take terribly long to take care of. -- Jon - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backups w/ rsync
On 9/28/07, Bill Davidsen [EMAIL PROTECTED] wrote: What I don't understand is how you use hard links... because a hard link needs to be in the same filesystem, and because a hard link is just another pointer to the inode and doesn't make a physical copy of the data to another device or to anywhere, really. Yes, I know how hard links work. There is (one) physical copy of the data when it goes from the filesystem on the raid to the filesystem on the drbd. Subsequent copies of the same file, assuming the file has not changed, are all hard links on the drbd-backed filesystem. Thus, I have one *physical* copy of the data and a whole bunch of hard links. Now, since I'm using drbd I actually have *two* physical copies (for a total of three if you include the original) because the *other* machine has a block-for-block copy of the drbd device (or it did, as of a few days ago). link-dest basically works like this: Assuming we are going to copy (using that word loosely here) file A from /source to /dest/backup.tmp/, and we've told rsync that /dest/backup.1/A might exist: If /dest/backup.1/A does not exist: make a physical copy from /source/A to /dest/backup.tmp/A. If it does exist, and the two files are considered identical, simply hardlink /dest/backup.tmp/A to /dest/backup.1/A. When all files are copied, move every /dest/backup.N (N is a number) to /dest/backup.N+1 If /dest/backup.31 exists, delete it. Move /dest/backup.tmp to /dest/backup.1 (which was just renamed /dest/backup.2) I can do all of this, for 175K files (40G), in under 2 minutes on modest hardware. I end up with: 1+1 physical copies of the data (local drbd copy and remote drbd copy) There is more but if I may suggest: if you want more details contact me off-line, I'm pretty sure the linux-raid folks couldn't care less about rsync and drbd. -- Jon - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backups w/ rsync
Dean S. Messing wrote: It has been some time since I read the rsync man page. I see that there is (among the bazillion and one switches) a --link-dest=DIR switch which I suppose does what you describe. I'll have to experiment with this and think things through. Thanks, Michal. Be aware that rsync is useful for making a *copy* of your files, which isn't always the best backup. If the goal is to preserve data and be able to recover in time of disaster, it's probably not optimal, while if you need frequent access to old or deleted files it's fine. For example, full and incremental backup methods such as dump and restore are usually faster to take and restore than a copy, and allow easy incremental backups. Consider: touch bkup_full_new timestamp=$(date +%Y%m%d-%T) find /home -depth | cpio -o -Hcrc | gzip -3 /mnt/USBbkup/full-$timestamp mv -f bkup_full_new bkup_full touch bkup_incr Now you can do an incremental (since last full or incremental) or partial (since last full): touch bkup_incr_new timestamp=$(date +%Y%m%d-%T) find /home -cnewer bkup_incr | cpio -o -Hcrc | gzip -3 /mnt/USBbkup/incr-$timestamp mv -f bkup_incr_new bkup_incr timestamp=$(date +%Y%m%d-%T) find /home -cnewer bkup_full | cpio -o -Hcrc | gzip -3 /mnt/USBbkup/part-$timestamp The advantage of the incr is that files are smaller, the advantage of partial is that you only restore full+part (two total), and the advantage of rsync is that deleted files will really be deleted (that's why I say it a copy, not a backup). Hope this is useful. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backups w/ rsync
Michael Tokarev [EMAIL PROTECTED] writes: Dean S. Messing wrote: Michal Soltys writes: [] : Rsync is fantastic tool for incremental backups. Everything that didn't : change can be hardlinked to previous entry. And time of performing the : backup is pretty much neglible. Essentially - you have equivalent of : full backups at almost minimal time and space cost possible. It has been some time since I read the rsync man page. I see that there is (among the bazillion and one switches) a --link-dest=DIR switch which I suppose does what you describe. I'll have to experiment with this and think things through. Thanks, Michal. I haven't actually read the rsync manpage to this detail, but I do use rsync for backups this way, but a bit differently - yet more understandable without referring to manpages... ;) the procedure is something like this: cd /backups rm -rf tmp/ cp -al $yesterday tmp/ rsync -r --delete -t ... /filesystem tmp mv tmp $today That is, link the previous backup to temp (which takes no space except directories), rsync current files to there (rsync will break links for changed files), and rename temp to $today. I was thinking Michal Soltys ment it this way. You can probably replace the cp invocation with an rsync one but that hardly changes things. I don't think you can do this in a single rsync call. Please correct me if I'm wrong. MfG Goswin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help: very slow software RAID 5.
: Dean S. Messing wrote: : I have also discovered smartctl and have read that if the short smartctl : tests are run daily and the long test weekly that the chances of being : caught with my pants down are quite low, even in a two disk RAID-0 : config. What is your opinion? : : : There's a good paper on using smartctl to predict the health of disks, : and if you can't find it I probably have a copy somewhere, since I gave : a presentation on RAID issues which included it. But the basic premise : was that if you see errors of certain types, the drives are likely to : fail soon. It did *not* say that absent these warnings the drives were : unlikely to fail, un fact most drives which did fail did so without : warning. So for about 90% of the failures there is no warning. : : I had servers a few years ago, running 6TB/server, on lots of small fast : drives, and I concluded that the predictive value of SMART was so small : that it didn't justify looking at the reports. Take that as my opinion, : assume that drives fail without warning. From what you and another poster said (about the False Alarm rate of Smartctl) I'll put my trust in backups, alone. I agree: if it predicts such a low % of failures, there's no point to waste time reading the reports and having a false sense of security. : I'm getting around to replying to several things you have said in : various posts, so that people who are threading answers will be happy... I'll look forward to your comments, especially on my misconceptions. I've learned a great deal already. Dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help: very slow software RAID 5.
Bill Davidsen wrote: : Dean S. Messing wrote: : Again, I don't get these speeds. Seq. reads are about : 170% of the average of my three physical drives if I turn up : the look-ahead. Then random access reads drops to slightly less : than my slowest drive. : : As nearly as I can tell, Dean was talking about RAID-10 at that point (I : also suggested that) which you haven't tried. I was talking about the three drive RAID-5 on which I ran bonnie++ measurements. I have not (yet) tried RAID-10. : For small numbers of : drives, assume the read speed will be (N - 1) * S for large sequential : read, using RAID-10. Where S is the speed of a single drive. Random read : depends on so many things I can't begin to quantify them in anything : less than a full white paper, but for a single thread assume somewhere : around S and aggregate (N - 1) * S again. Writes depend a lot on system : tuning, stripe size, stripe_cache_size, chunk size, etc. Fortunately the : best way to boost write speed is to have lots of memory and let the : kernel buffer. How does one let the kernel buffer? (I have plenty of memory for most things.) I know about write-back vs. write-through to reduce the write asymmetry of RAID-5. Is this what you mean by a kernel buffer? : Finally, when you create your ext filesystem, think of: : - ext2 - no journal : - noatime mounts to avoid journal writes : - manually make the journal file *large* to spread head motion over drives : - consider moving journal file to a dedicated device (that old 20GB : PATA drive?) : - use the ext3 stride tuning stuff (I'm quantifying that in the next : ten days). : : Or just make a RAID-10 far array and stop agonizing over this stuff, : there is no config which is best for everything, you must realize fast, : cheap, reliable - pick two is the design paradigm of RAID, and the more : you optimize for one usage pattern the more you impact some other. Dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backups w/ rsync
Michael Tokarev writes: : Dean S. Messing wrote: : Michal Soltys writes: : [] : : Rsync is fantastic tool for incremental backups. Everything that didn't : : change can be hardlinked to previous entry. And time of performing the : : backup is pretty much neglible. Essentially - you have equivalent of : : full backups at almost minimal time and space cost possible. : : It has been some time since I read the rsync man page. I see that : there is (among the bazillion and one switches) a --link-dest=DIR : switch which I suppose does what you describe. I'll have to : experiment with this and think things through. Thanks, Michal. : : I haven't actually read the rsync manpage to this detail, but I : do use rsync for backups this way, but a bit differently - yet : more understandable without referring to manpages... ;) : : the procedure is something like this: : : cd /backups : rm -rf tmp/ : cp -al $yesterday tmp/ : rsync -r --delete -t ... /filesystem tmp : mv tmp $today : : That is, link the previous backup to temp (which takes no space : except directories), rsync current files to there (rsync will : break links for changed files), and rename temp to $today. Very nice. The breaking of the hardlink is the key. I wondered about this when Michal using rsync yesterday. I just tested the idea. It does indeed work. One question: why do you not use -a instead of -r -t? It would seem that one would want to preserve permissions, and group and user ownerships. Also, is there a reason to _not_ preserve sym-links in the backup. Your script appears to copy the referent. Dean P.S. I think this thread has wandered from the topic of linux-raid. I'm happy to cease and desist if this Off Topic discussion offends. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html