Re: rsync script for snapshot backups
On 19 June 2016 at 10:27, Simon Hobson wrote: > Dennis Steinkamp wrote: > >> i tried to create a simple rsync script that should create daily backups >> from a ZFS storage and put them into a timestamp folder. >> After creating the initial full backup, the following backups should only >> contain "new data" and the rest will be referenced via hardlinks (-link-dest) >> ... >> Well, it works but there is a huge flaw with his approach and i am not able >> to solve it on my own unfortunately. >> As long as the backups are finishing properly, everything is fine but as >> soon as one backup job couldn`t be finished for some reason, (like it will >> be aborted accidently or a power cut occurs) >> the whole backup chain is messed up and usually the script creates a new >> full backup which fills up my backup storage. > > Yes indeed, this is a typical flaw with many systems - you often need to > throw away the partial backup. > One option that comes to mind is this : > Create the new backup in a directory called (for example) "new" or > "in-progress". If, and only if, the backup completes, then rename this to a > timestamp. If when you start a new backup, if the in-progress folder exists, > then use that and it'll be freshened to the current source state. I have an extremely similar script for my backups and that's exactly what I do to deal with backups that are stopped mid-way, either by power failures or by me. I rsync to a .tmp-$target directory, where $target is what I'm backing up. I have separate backups for my rootfs and /home. I also start the whole thing under ionice so that my computer doesn't get slow from all this I/O. Lastly, before renaming the .tmp-$target to the final directory I do a `sync -f` because rsync doesn't seem to call fsync() when copying files and you can have a failed backup if a power failure happens after the rename(). Here is my script: #!/bin/bash set -o errexit set -o pipefail target=$1 case "$target" in home) source=/home ;; root) source=/ ;; esac PATHTOBACKUP=/root/backup date=$(date --utc "+%Y-%m-%dT%H:%M:%S") ionice --class 3 rsync \ --archive \ --verbose \ --one-file-system \ --sparse \ --delete \ --compress \ --log-file=$PATHTOBACKUP/.tmp-$target.log \ --link-dest=$PATHTOBACKUP/$target-current \ $source $PATHTOBACKUP/.tmp-$target sync -f $PATHTOBACKUP/.tmp-$target mv $PATHTOBACKUP/.tmp-$target.log $PATHTOBACKUP/$target-$date.log mv $PATHTOBACKUP/.tmp-$target $PATHTOBACKUP/$target-$date ln --symbolic --force --no-dereference $target-$date $PATHTOBACKUP/$target-current -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync script for snapshot backups
Am 20.06.2016 um 22:01 schrieb Larry Irwin (gmail): The scripts I use analyze the rsync log after it completes and then sftp's a summary to the root of the just completed rsync. If no summary is found or the summary is that it failed, the folder rotation for that set is skipped and that folder is re-used on the subsequent rsync. The key here is that the folder rotation script runs separately from the rsync script(s). For each entity I want to rsync, I create a named folder to identify it and the rsync'd data is held in sub-folders: daily.[1-7] and monthly.[1-3] When I rsync, I rsync into daily.0 using daily.1 as the link-dest. Then the rotation script checks daily.0/rsync.summary - and if it worked, it removes daily.7 and renames the daily folders. On the first of the month, the rotation script removes monthly.3, renames the other 2 and makes a complete hard-link copy of daily.1 to monthly.1 It's been running now for about 4 years and, in my environment, the 10 copies take about 4 times the space of a single copy. (we do complete copies of linux servers - starting from /) If there's a good spot to post the scripts, I'd be glad to put them up. Hi Larry, that is something i couldn`t do with my current scripting skills but it sounds very interesting and i really would like to know how you did it if you don`t mind showing me your script of course. As for my script, this is what i came up with. #!/bin/sh # rsync copy scriptv2 for rsync pull from FreeNAS to BackupNAS # Set Date B_DATE=$(date +"%d-%m-%Y-%H%M") EXPIRED=`date +"%d-%m-%Y" -d "14 days ago"` # Create directory if it doesn`t exist already if ! [ -d /volume1/Backup_Test/in_progress ] ; then mkdir -p /volume1/Backup_Test/in_progress fi # rsync command if [ -f /volume1/rsync/Test/linkdest.txt ] ; then rsync -aqzh \ --delete --stats --exclude-from=/volume1/rsync/Test/exclude.txt \ --log-file=/volume1/Backup_Test/logs/rsync-$B_DATE.log \ --link-dest=/volume1/Backup_Test/`cat /volume1/rsync/Test/linkdest.txt`\ Test@192.168.2.2::Test /volume1/Backup_Test/in_progress else rsync -aqzh \ --delete --stats --exclude-from=/volume1/rsync/Test/exclude.txt \ --log-file=/volume1/Backup_Test/logs/rsync-$B_DATE.log \ Test@192.168.2.2::Test /volume1/Backup_Test/in_progress fi # Check return value if [ $? = 24 -o $? = 0 ] ; then mv /volume1/Backup_Test/in_progress /volume1/Backup_Test/$B_DATE echo $B_DATE > /volume1/rsync/Test/linkdest.txt fi # Delete expired snapshots (2 weeks old) if [ -d /volume1/Backup_Test/$EXPIRED-* ] then rm -Rf /volume1/Backup_Test/$EXPIRED-* fi Keep in mind i am not very good at this and if something can be improved or you see a big flaw in it, i would be grateful if you let me know. So far it seems to do the trick. I would like to improve it so that the logfile will be mailed to a specific e-mail adress after rsync completed successfully. Unfortunately the logfiles grow very big when i have lots of data to back up and i couldn`t figure out how to only send a specific part of the logfile or to customize the logfile somehow. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync script for snapshot backups
The scripts I use analyze the rsync log after it completes and then sftp's a summary to the root of the just completed rsync. If no summary is found or the summary is that it failed, the folder rotation for that set is skipped and that folder is re-used on the subsequent rsync. The key here is that the folder rotation script runs separately from the rsync script(s). For each entity I want to rsync, I create a named folder to identify it and the rsync'd data is held in sub-folders: daily.[1-7] and monthly.[1-3] When I rsync, I rsync into daily.0 using daily.1 as the link-dest. Then the rotation script checks daily.0/rsync.summary - and if it worked, it removes daily.7 and renames the daily folders. On the first of the month, the rotation script removes monthly.3, renames the other 2 and makes a complete hard-link copy of daily.1 to monthly.1 It's been running now for about 4 years and, in my environment, the 10 copies take about 4 times the space of a single copy. (we do complete copies of linux servers - starting from /) If there's a good spot to post the scripts, I'd be glad to put them up. -- Larry Irwin Cell: 864-525-1322 Email: lrir...@alum.wustl.edu Skype: larry_irwin About: http://about.me/larry_irwin On 06/19/2016 01:27 PM, Simon Hobson wrote: Dennis Steinkamp wrote: i tried to create a simple rsync script that should create daily backups from a ZFS storage and put them into a timestamp folder. After creating the initial full backup, the following backups should only contain "new data" and the rest will be referenced via hardlinks (-link-dest) ... Well, it works but there is a huge flaw with his approach and i am not able to solve it on my own unfortunately. As long as the backups are finishing properly, everything is fine but as soon as one backup job couldn`t be finished for some reason, (like it will be aborted accidently or a power cut occurs) the whole backup chain is messed up and usually the script creates a new full backup which fills up my backup storage. Yes indeed, this is a typical flaw with many systems - you often need to throw away the partial backup. One option that comes to mind is this : Create the new backup in a directory called (for example) "new" or "in-progress". If, and only if, the backup completes, then rename this to a timestamp. If when you start a new backup, if the in-progress folder exists, then use that and it'll be freshened to the current source state. Also, have you looked at StoreBackup ? http://storebackup.org I does most of this automagically, keeps a definable history (eg one/day for 14 days, one/week for x weeks, one/30d for y years), plus it keeps file hashes so can detect bit-rot in your backups. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync script for snapshot backups
Rely on the other answers here as to how to do it right. I just want to mention a few things in your script. yes | cp /volume1/rsync/Buero/timenow.txt /volume1/rsync/Buero/timeold.txt Yes is a program which puts out "Y" (or whatever you tell it to) forever - not what you want - and cp does not accept input from a pipe unless the first argument is "-" or some similar fancier construction. You can probably just leave off the "yes | " and have the statement work exactly as it does now. It looks like your EXPIRED logic will only find a directory which *exactly* matches that date. You might look at using something like a find command to find directories older than 14 days. Some find options which might help: -ctime 14 specifies finding things modified more than 14 days ago -type d specifies finding only directories -maxdepth 1 specifies finding things only one level below the path find starts at -exec ls -l {} \; specifies running a command on every result which is returned - in this case, an ls which can't hurt anything. You can replace ls with something like rm -rf {} when you're *very* sure the command is finding *exactly* what you want it to. I didn't put the whole command together because until you understand how it works, you don't want to try something that might delete a bunch of things beyond what you actually want deleted. Joe On 06/19/2016 08:22 AM, Dennis Steinkamp wrote: Hey guys, i tried to create a simple rsync script that should create daily backups from a ZFS storage and put them into a timestamp folder. After creating the initial full backup, the following backups should only contain "new data" and the rest will be referenced via hardlinks (-link-dest) This was at least a simple enough scenario to achieve it with my pathetic scripting skills. This is what i came up with: #!/bin/sh # rsync copy script for rsync pull from FreeNAS to BackupNAS for Buero dataset # Set variables EXPIRED=`date +"%d-%m-%Y" -d "14 days ago"` # Copy previous timefile to timeold.txt if it exists if [ -f "/volume1/rsync/Buero/timenow.txt" ] then yes | cp /volume1/rsync/Buero/timenow.txt /volume1/rsync/Buero/timeold.txt fi # Create current timefile echo `date +"%d-%m-%Y-%H%M"` > /volume1/rsync/Buero/timenow.txt # rsync command if [ -f "/volume1/rsync/Buero/timeold.txt" ] then rsync -aqzh \ --delete --stats --exclude-from=/volume1/rsync/Buero/exclude.txt \ --log-file=/volume1/Backup_Test/logs/rsync-`date +"%d-%m-%Y-%H%M"`.log \ --link-dest=/volume1/Backup_Test/`cat /volume1/rsync/Buero/timeold.txt` \ Test@192.168.2.2::Test/volume1/Backup_Test/`date +"%d-%m-%Y-%H%M"` else rsync -aqzh \ --delete --stats --exclude-from=/volume1/rsync/Buero/exclude.txt \ --log-file=/volume1/Backup_Buero/logs/rsync-`date +"%d-%m-%Y-%H%M"`.log \ Test@192.168.2.2::Test/volume1/Backup_Test/`date +"%d-%m-%Y-%H%M"` fi # Delete expired snapshots (2 weeks old) if [ -d /volume1/Backup_Buero/$EXPIRED-* ] then rm -Rf /volume1/Backup_Buero/$EXPIRED-* fi Well, it works but there is a huge flaw with his approach and i am not able to solve it on my own unfortunately. As long as the backups are finishing properly, everything is fine but as soon as one backup job couldn`t be finished for some reason, (like it will be aborted accidently or a power cut occurs) the whole backup chain is messed up and usually the script creates a new full backup which fills up my backup storage. What i would like to achieve is, to improve the script so that a backup run that wasn`t finished properly will be resumed, next time the script triggers. Only if that was successful should the next incremental backup be created so that the files that didn`t changed from the previous backup can be hardlinked properly. I did a little bit of research and i am not sure if i am on the right track here but apparently this can be done with return codes, but i honestly don`t know how to do this. Thank you in advance for your help and sorry if this question may seem foolish to most of you people. Regards Dennis -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync script for snapshot backups
Am 19.06.2016 um 19:27 schrieb Simon Hobson: Dennis Steinkamp wrote: i tried to create a simple rsync script that should create daily backups from a ZFS storage and put them into a timestamp folder. After creating the initial full backup, the following backups should only contain "new data" and the rest will be referenced via hardlinks (-link-dest) ... Well, it works but there is a huge flaw with his approach and i am not able to solve it on my own unfortunately. As long as the backups are finishing properly, everything is fine but as soon as one backup job couldn`t be finished for some reason, (like it will be aborted accidently or a power cut occurs) the whole backup chain is messed up and usually the script creates a new full backup which fills up my backup storage. Yes indeed, this is a typical flaw with many systems - you often need to throw away the partial backup. One option that comes to mind is this : Create the new backup in a directory called (for example) "new" or "in-progress". If, and only if, the backup completes, then rename this to a timestamp. If when you start a new backup, if the in-progress folder exists, then use that and it'll be freshened to the current source state. Also, have you looked at StoreBackup ? http://storebackup.org I does most of this automagically, keeps a definable history (eg one/day for 14 days, one/week for x weeks, one/30d for y years), plus it keeps file hashes so can detect bit-rot in your backups. Thank you for taking the time to answer me. Your suggestion is what i also had in mind but i wasn`t sure if this would be "best practice" To build this idea into my script i probably need to hardcode the target directory rsync writes to (e.g new or in-progress) and move the directory name to a timestamp only after rsync gave a return code of 0, am i correct? (or return code 0 and 24?) As for StoreBackup, it really does sound nice but i have to do all of this from the console of a 2bay synology nas, so its not that easy to use 3rd party software that may has other dependencies, the synology system doesn`t meet. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: rsync script for snapshot backups
Dennis Steinkamp wrote: > i tried to create a simple rsync script that should create daily backups from > a ZFS storage and put them into a timestamp folder. > After creating the initial full backup, the following backups should only > contain "new data" and the rest will be referenced via hardlinks (-link-dest) > ... > Well, it works but there is a huge flaw with his approach and i am not able > to solve it on my own unfortunately. > As long as the backups are finishing properly, everything is fine but as soon > as one backup job couldn`t be finished for some reason, (like it will be > aborted accidently or a power cut occurs) > the whole backup chain is messed up and usually the script creates a new full > backup which fills up my backup storage. Yes indeed, this is a typical flaw with many systems - you often need to throw away the partial backup. One option that comes to mind is this : Create the new backup in a directory called (for example) "new" or "in-progress". If, and only if, the backup completes, then rename this to a timestamp. If when you start a new backup, if the in-progress folder exists, then use that and it'll be freshened to the current source state. Also, have you looked at StoreBackup ? http://storebackup.org I does most of this automagically, keeps a definable history (eg one/day for 14 days, one/week for x weeks, one/30d for y years), plus it keeps file hashes so can detect bit-rot in your backups. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
rsync script for snapshot backups
Hey guys, i tried to create a simple rsync script that should create daily backups from a ZFS storage and put them into a timestamp folder. After creating the initial full backup, the following backups should only contain "new data" and the rest will be referenced via hardlinks (-link-dest) This was at least a simple enough scenario to achieve it with my pathetic scripting skills. This is what i came up with: #!/bin/sh # rsync copy script for rsync pull from FreeNAS to BackupNAS for Buero dataset # Set variables EXPIRED=`date +"%d-%m-%Y" -d "14 days ago"` # Copy previous timefile to timeold.txt if it exists if [ -f "/volume1/rsync/Buero/timenow.txt" ] then yes | cp /volume1/rsync/Buero/timenow.txt /volume1/rsync/Buero/timeold.txt fi # Create current timefile echo `date +"%d-%m-%Y-%H%M"` > /volume1/rsync/Buero/timenow.txt # rsync command if [ -f "/volume1/rsync/Buero/timeold.txt" ] then rsync -aqzh \ --delete --stats --exclude-from=/volume1/rsync/Buero/exclude.txt \ --log-file=/volume1/Backup_Test/logs/rsync-`date +"%d-%m-%Y-%H%M"`.log \ --link-dest=/volume1/Backup_Test/`cat /volume1/rsync/Buero/timeold.txt` \ Test@192.168.2.2::Test/volume1/Backup_Test/`date +"%d-%m-%Y-%H%M"` else rsync -aqzh \ --delete --stats --exclude-from=/volume1/rsync/Buero/exclude.txt \ --log-file=/volume1/Backup_Buero/logs/rsync-`date +"%d-%m-%Y-%H%M"`.log \ Test@192.168.2.2::Test/volume1/Backup_Test/`date +"%d-%m-%Y-%H%M"` fi # Delete expired snapshots (2 weeks old) if [ -d /volume1/Backup_Buero/$EXPIRED-* ] then rm -Rf /volume1/Backup_Buero/$EXPIRED-* fi Well, it works but there is a huge flaw with his approach and i am not able to solve it on my own unfortunately. As long as the backups are finishing properly, everything is fine but as soon as one backup job couldn`t be finished for some reason, (like it will be aborted accidently or a power cut occurs) the whole backup chain is messed up and usually the script creates a new full backup which fills up my backup storage. What i would like to achieve is, to improve the script so that a backup run that wasn`t finished properly will be resumed, next time the script triggers. Only if that was successful should the next incremental backup be created so that the files that didn`t changed from the previous backup can be hardlinked properly. I did a little bit of research and i am not sure if i am on the right track here but apparently this can be done with return codes, but i honestly don`t know how to do this. Thank you in advance for your help and sorry if this question may seem foolish to most of you people. Regards Dennis -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html