Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
On 02/24 11:09 , Les Mikesell wrote: That means there is no meaningful way of deleting an older backup, as the parent files may be lost, rendering future links useless? On unix filesystems, the contents are not removed until the last link is deleted and no process has the file open. looked at another way, a hardlink is just another name for the file, exactly like the first one it had. A hardlink is a directory index entry, in exactly the same way that the first name for the file was. -- Carl Soderstrom Systems Administrator Real-Time Enterprises www.real-time.com - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
Jason B wrote: close to the same way as an incremental, except it's more useful, so to say? Incidentally, unrelated, but something that's been bugging me for a while: subsequent full backups hardlink to older ones that have the true copy of the file, correct? That means there is no meaningful way of deleting an older backup, as the parent files may be lost, rendering future links useless? Not quite - if it were symlinks that would be true but BackupPC uses hard links - with a hard link the underlying inode (which describes the file data) persists as long as there is at least one link to it. When old backups get purged what really happens is they gut unlinked (doing rm -rf on a numbered backup in the directory for an individual pc has the same effect). If all the numbered backups that reference a file get removed then only a single link (from the pool tree) will remain. The nightly cleanup code looks for files with one link and removes them. So you can safely delete older backups knowing that only files that are unique to that backup will disappear (and also knowing that you won't get the disk space backup until after the nightly cleanup). John - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
Jason B wrote: 3.) Rsync(d) full backups go to more trouble to determine what has changed, meaning they're more expensive in terms of CPU time and disk I/O, but they'll catch changes incrementals may have missed. That means they're vital every now and then, supposing you want a meaningful backup of your data. In that case, though, what advantage is there to running incrementals vs fulls? The server load? To me, a full backup implies a complete re-transfer of all files, but you are saying a rsync(d) full backup, in effect, functions close to the same way as an incremental, except it's more useful, so to say? There are two differences. One is that it does a full block checksum comparision of the files at both ends which may take a lot longer even though it doesn't use much more bandwidth than the incrementals that skip files where the timestamp and length match. The other is that it completely populates the backup directory which can then be used as the basis for subsequent incrementals. Incidentally, unrelated, but something that's been bugging me for a while: subsequent full backups hardlink to older ones that have the true copy of the file, correct? Not exactly. All of the copies end up linked to a common file in the cpool directory. The fact that they are linked to each other is incidental - all identical files will be linked, not just ones that match previous backups of the same file. The filename in the cpool directory is a hash value used as a quick way to find the matches. That means there is no meaningful way of deleting an older backup, as the parent files may be lost, rendering future links useless? On unix filesystems, the contents are not removed until the last link is deleted and no process has the file open. Thus you can remove any individual backup without affecting any of the others. You don't get the space back until the nightly cleanup job runs, removing the links in the cpool directory where the link count is 1 (meaning no backups still use it). -- Les Mikesell [EMAIL PROTECTED] - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
On 24/02/07, Jason B [EMAIL PROTECTED] wrote: Incidentally, unrelated, but something that's been bugging me for a while: subsequent full backups hardlink to older ones that have the true copy of the file, correct? That means there is no meaningful way of deleting an older backup, as the parent files may be lost, rendering future links useless? Not exactly. BackupPC always keeps the files hardlinked to a file in the cpool directory, so the parent files should never get lost. -- cheers, -ambrose Don't trust everything you read in Wikipedia. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
Les Mikesell wrote: Apologies for the relatively long email, but I figure it's better to give too much information than not enough. I've run into a bit of difficulty backing up a large directory tree that has me not being able to do a successful backup in over a month now. I'm attempting to back up about 70GB over the Internet with a 1 MB/sec connection (the time it takes doesn't really bother me, just want to do a full backup and then run incrementals all the time). However, the transfer always times out with signal=ALRM. ALRM should mean the server's $Conf{ClientTimeout} expired. You may need to make it much longer. The time is supposed to mean inactivity but some circumstances make it the total time for a transfer to complete. signal=PIPE means the connection broke or the client side quit unexpectedly. Although the ALRM and PIPE signals are probably technically correct it might be clearer to use different terms/explanations in the interface. I have the feeling not everyone understands these signals. Nils Breunese. PGP.sig Description: Dit deel van het bericht is digitaal ondertekend - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
Nils Breunese (Lemonbit) wrote: Apologies for the relatively long email, but I figure it's better to give too much information than not enough. I've run into a bit of difficulty backing up a large directory tree that has me not being able to do a successful backup in over a month now. I'm attempting to back up about 70GB over the Internet with a 1 MB/sec connection (the time it takes doesn't really bother me, just want to do a full backup and then run incrementals all the time). However, the transfer always times out with signal=ALRM. ALRM should mean the server's $Conf{ClientTimeout} expired. You may need to make it much longer. The time is supposed to mean inactivity but some circumstances make it the total time for a transfer to complete. signal=PIPE means the connection broke or the client side quit unexpectedly. Although the ALRM and PIPE signals are probably technically correct it might be clearer to use different terms/explanations in the interface. I have the feeling not everyone understands these signals. man signal will show all the possibilities. SIGPIPE isn't very clear because it really just means a child process terminated while the parent is still trying to communicate with it, but in this case the child is the ssh, rsync, or smbclient that is doing the transfer from the remote and the likely reasons are either a network problem or that the remote side terminated. -- Les Mikesell [EMAIL PROTECTED] - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
Les Mikesell wrote: Although the ALRM and PIPE signals are probably technically correct it might be clearer to use different terms/explanations in the interface. I have the feeling not everyone understands these signals. man signal will show all the possibilities. SIGPIPE isn't very clear because it really just means a child process terminated while the parent is still trying to communicate with it, but in this case the child is the ssh, rsync, or smbclient that is doing the transfer from the remote and the likely reasons are either a network problem or that the remote side terminated. I know I can take a look at the man pages, but I still think it would better for the web interface to display something a bit clearer than just the signal name. Nils Breunese. PGP.sig Description: Dit deel van het bericht is digitaal ondertekend - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
Nils Breunese (Lemonbit) wrote: Les Mikesell wrote: Although the ALRM and PIPE signals are probably technically correct it might be clearer to use different terms/explanations in the interface. I have the feeling not everyone understands these signals. man signal will show all the possibilities. SIGPIPE isn't very clear because it really just means a child process terminated while the parent is still trying to communicate with it, but in this case the child is the ssh, rsync, or smbclient that is doing the transfer from the remote and the likely reasons are either a network problem or that the remote side terminated. I know I can take a look at the man pages, but I still think it would better for the web interface to display something a bit clearer than just the signal name. I haven't looked at the code, but it probably just picks up the error or exit status and its description as returned by the operating system. Things like 'no space on device' or 'permission denied' are a little more understandable but there are a lot of possibilities for failure. -- Les Mikesell [EMAIL PROTECTED] - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
Hi, Jason B wrote on 20.02.2007 at 20:28:59 [Re[2]: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE]: [...] $Conf{ClientTimeout} will need to be at least 72000 [...] I see. I must've been misunderstanding the meaning of that setting - my original impression was that it be the time that it would wait, at most, if nothing is happening before it times out - I assumed that if files are being transferred, that is sufficient activity for it to keep re-setting that timer. [...] that is the way it would ideally be supposed to work. Unfortunately that's not really easy to implement, as the instance (i.e. process) *watching* the transfer is not the one *doing* the transfer. Apparently, the tar and smb transfer methods are a bit better than rsync(d) in that the alarm time is reset whenever (informational) output from the tar command is received. This is not really an advantage, because you're dependent on the transfer time of the largest file instead of the total backup. File sizes probably vary more than total backup sizes. You don't really want to do that, for various reasons. Would you suggest, in that case, to lower the frequency of incrementals, and raise the frequency of full backups? I was going on the idea of doing an incremental once every 2 days or so, and a full backup once a month (because of the size of the data and the persistent timeouts). Well, you *wrote* you wanted no full backups at all. Whether one month is a good interval for full backups or not really depends on your data, the changes, your bandwidth, and your requirements. If you require an exact backup that is at most a week old (meaning no missed changes are acceptable), then you'll need a weekly full. If the same files change every day, your incrementals won't grow as much as if different files change every day. If the time a backup takes is unimportant, as long as it finishes within 24 hours, you can probably get away with longer intervals between full backups. If bandwidth is more expensive than server load, you'll need shorter intervals. You'll have to work out for yourself, which interval best fits your needs. I was just saying: no fulls and only incrementals won't work. You can always configure monthly (automatic) full backups and then start one by hand after a week. See how long that takes. Start the next one after further two weeks. See how much interval you can get away with. Or watch how long your incrementals are taking. BackupPC provides you with a lot of flexibility. Concerning the incremental backups: if you need (or want) a backup every two days, then you should do one every two days. If that turns out to be too expensive in terms of network bandwidth, you'll have to change something. Doing *each backup as a full backup* (using rsync(d)!) will probably minimize network utilisation at the expense of (much!) server load. Again: there's no one fits all answer. Jason Hughes explained how to incrementally transfer such a structure using $Conf{BackupFilesExclude}. The important thing is that you need successful backups to avoid re-transferring data, even if these backups at first comprise only part of your target data. [...] What I currently have is a rsyncd share for about 10 - 12 different subdirectories (I drilled down a level with the expectation that splitting into separate shares might help with the timeouts; I have not considered the possibility of backing up separately, though). By that token, I would imagine that I just comment out the shares I don't need at present, and re-activate them once the backups are done, right? And once I've gone through the entire tree, just enable them all and hope for the best? I'm not sure I understand you correctly. The important thing seems to be: define your share as you ultimately want it to be. Exclude parts of it at first (with $Conf{BackupFilesExclude}) to get a successful backup. Altering $Conf{BackupFilesExclude} will not change your backup definition, i.e. it will appear as if the share started off with a few files and quickly grew from backup to backup. You can start a full backup by hand every hour (after changing $Conf{BackupFilesExclude}) to get your share populated, no need to wait for your monthly full :). Each successful full backup (with less excluded files) will bring you nearer to your goal. Each future full backup will be similar to the last of these steps: most of the files are already on the backup server, only a few percent need to be transfered. If, in contrast, you start up with several distinct shares, you'll either have to keep it that way forever, or re-transfer files, or do some other magic to move them around the backups and hope everything goes well. It's certainly possible, but it's not easy. Using $Conf{BackupFilesExclude} is, and you can't do much wrong, as long as you finally end up excluding nothing you don't want to exclude. Regards, Holger
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
Jason B wrote: However, the transfer always times out with signal=ALRM. [...] Somewhat unrelated, but of all these attempts, it hasn't ever kept a partial - so it transfers the files, fails, and removes them. I have one partial from 3 weeks ago that was miraculously kept, so it keeps coming back to it. Would anybody have any ideas on what I can do? I've set $Conf{ClientTimeout} = 7200; in the config.pl... enabled --checksum-seed... disabled compression to rsync... no other ideas. Running BackupPC-3.0.0 final. I'm guessing the connection gets broken at some point (using rsycnd), but is there any way to make BackupPC attempt to reconnect and just continue from where it left off? Not exactly. It's a gripe that has come up before. The way BackupPC works is by completing a job. Anything incomplete is essentially thrown away the next time it runs. You might try bumping up your ClientTimeout to a higher number, but chances are, you're actually seeing the pipe break because the connection is cut or TCP errors occur that prevent routing or who knows what. If you think about it, larger transfers are much more susceptible to this because there may be a small chance that a connection is cut at any time, the longer the connection the more likely it breaks... any unrecoverable transfer will tend toward impossible to complete as the tranfer time increases. :-( On a final note: interestingly, backups from the SAME physical host using a different hostname (to back up another, much smaller, virtualhost directory) work perfectly every day, never failed. So I'm guessing it's just having a problem with the size / # of files. What can I do? I have a machine that has a lot of video (120gb) across a wifi WDS link (half 802.11g speed, at best). I could never get an initial backup to succeed, because it could take 30-50 hours. What I did was set up excludes on tons of directories, so the first backup was very short. I kicked it off manually and waited until it completed. Then I removed one excluded directory and kicked off another. BackupPC skips files that have been entered into the pool due to a completed backup, so it is kind of like biting off smaller pieces of a single larger backup. Repeat until all your files have made it into the pool. At that point, your total backups will be very short and only include deltas. Other people have had success with moving your server physically to the LAN of the client and doing the backup over a fast, stable connection, to populate the pool with files initially. That may not be an option for you. Good luck, JH - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/
Re: [BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE
Hi, Jason B wrote on 20.02.2007 at 21:28:43 [[BackupPC-users] Backing up large directories times out with signal=ALRM or PIPE]: I've run into a bit of difficulty backing up a large directory tree that has me not being able to do a successful backup in over a month now. I'm attempting to back up about 70GB over the Internet with a 1 MB/sec connection (the if you really mean 8 MBit/s your backup will need about 20 hours to complete, meaning $Conf{ClientTimeout} will need to be at least 72000 (if you meant 128KB/s, it's obviously 8 times as much). Setting it to this value or more is no problem. It just means, if a backup happens to get somehow stuck, BackupPC will need that long to recover, possibly blocking other backups for the time due to $Conf{MaxBackups}. That may or may not be a problem for you in the long run, so you'll probably want to adjust it once you've got a feeling for how long your backups take in the worst case. time it takes doesn't really bother me, just want to do a full backup and then run incrementals all the time). You don't really want to do that, for various reasons. 1.) An incremental is based on the last full backup (or incremental of lower level, to be exact). That means, everything changed since the last full backup will be transfered on each incremental - more data from day to day. 2.) In contrast to this, an rsync(d) full backup will also only transfer files changed since the last full backup (i.e. ideally not more than an incremental), but it will give you a new reference point, meaning future incrementals transfer less data. 3.) Rsync(d) full backups go to more trouble to determine what has changed, meaning they're more expensive in terms of CPU time and disk I/O, but they'll catch changes incrementals may have missed. That means they're vital every now and then, supposing you want a meaningful backup of your data. The tree is approximately like this: - top level 1 - articles - dir 1 - subdirs 1 through 9 - dir 2 - subdirs 1 through 9 etc until dir 9 (same subdir structure) - images - dir 1 - subdirs 1 through 9 - dir 2 - subdirs 1 through 9 etc until dir 9 (same subdir structure) - top level 4 There are (on average) 5,000 files per directory (about 230,000 files in total). Jason Hughes explained how to incrementally transfer such a structure using $Conf{BackupFilesExclude}. The important thing is that you need successful backups to avoid re-transferring data, even if these backups at first comprise only part of your target data. It might be enough to split the process into two parts by first excluding half of your toplevel directories and then removing the excludes for the second run. You might even be able to transfer everything at once by simply adjusting your $Conf{ClientTimeout}. If in doubt, set the value way too high rather than slightly too low. You can always adjust it after your first successful backup. Regards, Holger - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/