[Amanda-users] Bare metal restore problem
Jean-Louis Martineau wrote: What program do you use to do the backup? dump or tar? Jean-Louis khalil_noura wrote: Hi all, I am trying to do a bare metal restore. After I partionsed the disk and restored the client to holding. I rebooted to rescue mode created a /mnt/sysimage directory mounted /dev/sda1 to /mnt/sysimage then cd /mnt/sysimage. run this command: ssh amandaserver cat /holdingdisk/hostname._.0 | restore rf - I get an error that reads tape read error on first record any ideas? Thank you, +-- |This was sent by adarouic at curry.edu via Backup Central. |Forward SPAM to abuse at backupcentral.com. +-- I am using Dump +-- |This was sent by adaro...@curry.edu via Backup Central. |Forward SPAM to ab...@backupcentral.com. +--
Snapshots, redundant ?
Here is a fun thought question. Assuming that we are running daily snapshots for the purposes of easing file restores and providing recovery for users that accidently remove files they still need (or corrupt spreadsheets or whatever they do)... daily snapshots on a rolling basis. /usr1 has snapshots /usr1-monday... /usr1-friday and we remove and recreate one snapshot/day during the course of the week. What if any added benifit am I seeing by using zfs-snapshots from within the amanda run. I'm not suggesting that we don't need backups, but the snapshot portion of the dump. I don't think it costs me anything extra, but I don't know that there are benifits. thanks, Brian --- Brian R Cuttler brian.cutt...@wadsworth.org Computer Systems Support(v) 518 486-1697 Wadsworth Center(f) 518 473-6384 NYS Department of HealthHelp Desk 518 473-0773 IMPORTANT NOTICE: This e-mail and any attachments may contain confidential or sensitive information which is, or may be, legally privileged or otherwise protected by law from further disclosure. It is intended only for the addressee. If you received this in error or from someone who was not authorized to send it to you, please do not distribute, copy or use it or any attachments. Please notify the sender immediately by reply e-mail and delete this from your system. Thank you for your cooperation.
hard coded limit REP_TIMEOUT of 6hrs in amandad-src/amandad.c
Hi, I was hitting this hard coded limit a couple years ago running 2.5.2-20070523 and over the last month it started to show up again on a client having a lot of big DLEs (20 or so for a total of 11TB). Client is running 2.6.0p2 and server is at 2.6.1p1. Back then Jean-Louis provided me with a patch that reseted the timeout after each estimates but just perusing the source I see that amandad.c has been extensively rewritten. Anyone cares to have a look at this? thanks, jf -- ° Jean-François Malouin McConnell Brain Imaging Centre Systems/Network Administrator Montréal Neurological Institute 3801 Rue Université, Suite WB219 Montréal, Québec, H3A 2B4 Phone: 514-398-8924 Fax: 514-398-8948
Re: Backup issues with OpenBSD 4.5 machines
You get the error in the index pipe instead of the data path. The backup is correct but your index is empty. That's why you get a STRANGE result instead of a failure. Can you post a sendbackup.*.debug for a dle that failed? Jean-Louis Michael Burk wrote: Hello, I applied the patches to the 20090827 snapshot. I tried it on two OpenBSD 4.5 sparc64 systems, forcing both to do full backups. One system seemed to work on all 3 file systems, the other failed on all 4 file systems with the same errors as before (exactly like what Stan reported). I'm using bsdtcp auth. The system that worked, however, did give a strange result in the report on one filesystem. Here's the corresponding sendbackup debug file: 1251692700.529842: sendbackup: pid 30459 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 2 2:25:00 2009 1251692700.530411: sendbackup: Version 2.6.2alpha 1251692700.553135: sendbackup: pid 30459 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:25:00 2009 1251692700.555394: sendbackup: Parsed request as: program `DUMP' 1251692700.555435: sendbackup: disk `/usr' 1251692700.555461: sendbackup: device `/usr' 1251692700.555484: sendbackup: level 0 1251692700.07: sendbackup: since NODATE 1251692700.30: sendbackup: options `' 1251692700.556750: sendbackup: start: zirconium.example.com:/usr lev 0 1251692700.559495: sendbackup: dumping device '/dev/rsd1d' with 'ffs' 1251692700.566094: sendbackup: Spawning /sbin/dump dump 0usf 1048576 - /dev/rsd1d in pipeline 1251692700.571185: sendbackup: Started backup 1251692700.572425: sendbackup: fd 3 is set with O_NONBLOCK before dup2 1251692700.597388: sendbackup: Started index creator: /sbin/restore -tvf - 21 | sed -e ' s/^leaf[]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d ' 1251692700.611526: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Sun Aug 30 22:25: 00 2009 1251692700.669703: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251692700.670724: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd1d (/usr) to standard output 1251692700.763600: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251692705.618345: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251692705.620040: sendbackup: 90: normal(|): DUMP: estimated 1330085 tape blocks. 1251692705.623976: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Sun Aug 30 22:25:05 2009 1251692705.650687: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251692745.051547: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251692747.915767: sendbackup: 115: strange(?): sed: stdout: Resource temporarily unavailable 1251692954.633083: sendbackup: 43:size(|): DUMP: 1437829 tape blocks 1251692954.637414: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Sun Aug 30 22:25:00 2009 1251692954.641414: sendbackup: 90: normal(|): DUMP: Volume 1 completed at: Sun Aug 30 22:29:14 2009 1251692954.641858: sendbackup: Index pipe exited with status 1 1251692954.642719: sendbackup: 90: normal(|): DUMP: Volume 1 took 0:04:09 1251692954.643918: sendbackup: 90: normal(|): DUMP: Volume 1 transfer rate: 5774 KB/s 1251692954.645095: sendbackup: 90: normal(|): DUMP: Date this dump completed: Sun Aug 30 22:29:14 2009 1251692954.646239: sendbackup: 90: normal(|): DUMP: Average transfer rate: 5774 KB/s 1251692954.647430: sendbackup: 90: normal(|): DUMP: level 0 dump on Sun Aug 30 22:25:00 2009 1251692954.653480: sendbackup: 90: normal(|): DUMP: DUMP IS DONE 1251692954.653841: sendbackup: Parsed backup messages 1251692954.654032: sendbackup: pid 30459 finish time Sun Aug 30 22:29:14 2009 -- Michael On Fri, Aug 28, 2009 at 5:22 AM, stan st...@panix.com mailto:st...@panix.com wrote: On Thu, Aug 27, 2009 at 03:35:47PM -0600, Michael Burk wrote: Cool - can you send me the patch? Or is it already in the 0826 snapshot? FYI - I got 2.5.0p2 working, but could never get 2.5.1p3 working, even with the same config. No idea why. I started looking at the source code last night in the 0825 snapshot. I'll discontinue that search if there's a patch. This is interesting. I sent Jean-Louis Martineau an email, saying teaks for fixing things, and he replied that the patch he sent me just added debugging, it was not intended to fix anything. Bit, using this snapshot 20090813, and the patch, which I will send to you. I have it working on my 3 OpenBSD 4.5 machines that I have upgraded so far. I ran several test yesterday, and a full fledged backup run last night, and I have not seen any issues since installing this code. I am going to attach the patch to
Re: Backup issues with OpenBSD 4.5 machines
Here are the two sendbackup.*.debug files for the / fs. First: 1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 22:31:42 2009 1251693102.345326: sendbackup: Version 2.6.2alpha 1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009 1251693102.382860: sendbackup: Parsed request as: program `DUMP' 1251693102.382940: sendbackup: disk `/' 1251693102.382991: sendbackup: device `/' 1251693102.383039: sendbackup: level 0 1251693102.383086: sendbackup: since NODATE 1251693102.383134: sendbackup: options `' 1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0 1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs' 1251693102.398325: sendbackup: Spawning /sbin/dump dump 0usf 1048576 - /dev/rsd0a in pipeline 1251693102.403120: sendbackup: Started backup 1251693102.461663: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Sun Aug 30 22:31:42 2009 1251693102.468669: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251693102.487536: sendbackup: Started index creator: /sbin/restore -tvf - 21 | sed -e ' s/^leaf[]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d ' 1251693102.519939: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a (/) to standard output 1251693102.588086: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251693104.583392: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251693104.585824: sendbackup: 90: normal(|): DUMP: estimated 35134 tape blocks. 1251693104.595314: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Sun Aug 30 22:31:44 2009 1251693104.615444: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251693104.919691: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251693105.800795: sendbackup: critical (fatal): index tee cannot write [Resource temporarily unavailable] 1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee cannot write [Resource temporarily unavailable] 1251693105.827339: sendbackup: 90: normal(|): DUMP: Broken pipe 1251693105.832068: sendbackup: 90: normal(|): DUMP: The ENTIRE dump is aborted. 1251693105.839675: sendbackup: critical (fatal): error [dump (7257) /sbin/dump returned 3] Second: 1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 22:32:03 2009 1251693123.841613: sendbackup: Version 2.6.2alpha 1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:32:03 2009 1251693123.896360: sendbackup: Parsed request as: program `DUMP' 1251693123.896437: sendbackup: disk `/' 1251693123.896487: sendbackup: device `/' 1251693123.896535: sendbackup: level 0 1251693123.896584: sendbackup: since NODATE 1251693123.896632: sendbackup: options `' 1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0 1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs' 1251693123.911778: sendbackup: Spawning /sbin/dump dump 0usf 1048576 - /dev/rsd0a in pipeline 1251693123.916860: sendbackup: Started backup 1251693123.992261: sendbackup: Started index creator: /sbin/restore -tvf - 21 | sed -e ' s/^leaf[]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d ' 1251693123.995743: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Sun Aug 30 22:32:03 2009 1251693124.009013: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251693124.038977: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a (/) to standard output 1251693124.143454: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251693126.173916: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251693126.176287: sendbackup: 90: normal(|): DUMP: estimated 35134 tape blocks. 1251693126.185677: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Sun Aug 30 22:32:06 2009 1251693126.205305: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251693126.505055: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251693126.797271: sendbackup: critical (fatal): index tee cannot write [Resource temporarily unavailable] 1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee cannot write [Resource temporarily unavailable] 1251693126.821488: sendbackup: 90: normal(|): DUMP: Broken pipe 1251693126.825377: sendbackup: 90: normal(|): DUMP: The ENTIRE dump is aborted. 1251693126.833573: sendbackup: critical (fatal): error [dump (22791) /sbin/dump returned 3] Thanks, Michael On Mon, Aug 31, 2009 at
Re: Backup issues with OpenBSD 4.5 machines
Do you have the patch I sent to stan on this system? The patch check before and after the write if the pipe is in O_NONBLOCK or not and give an error if it is. I'm totally lost since it is in blocking mode and you get EAGAIN, which is impossible Jean-louis Michael Burk wrote: Here are the two sendbackup.*.debug files for the / fs. First: 1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 22:31:42 2009 1251693102.345326: sendbackup: Version 2.6.2alpha 1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009 1251693102.382860: sendbackup: Parsed request as: program `DUMP' 1251693102.382940: sendbackup: disk `/' 1251693102.382991: sendbackup: device `/' 1251693102.383039: sendbackup: level 0 1251693102.383086: sendbackup: since NODATE 1251693102.383134: sendbackup: options `' 1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0 1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs' 1251693102.398325: sendbackup: Spawning /sbin/dump dump 0usf 1048576 - /dev/rsd0a in pipeline 1251693102.403120: sendbackup: Started backup 1251693102.461663: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Sun Aug 30 22:31:42 2009 1251693102.468669: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251693102.487536: sendbackup: Started index creator: /sbin/restore -tvf - 21 | sed -e ' s/^leaf[]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d ' 1251693102.519939: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a (/) to standard output 1251693102.588086: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251693104.583392: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251693104.585824: sendbackup: 90: normal(|): DUMP: estimated 35134 tape blocks. 1251693104.595314: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Sun Aug 30 22:31:44 2009 1251693104.615444: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251693104.919691: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251693105.800795: sendbackup: critical (fatal): index tee cannot write [Resource temporarily unavailable] 1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee cannot write [Resource temporarily unavailable] 1251693105.827339: sendbackup: 90: normal(|): DUMP: Broken pipe 1251693105.832068: sendbackup: 90: normal(|): DUMP: The ENTIRE dump is aborted. 1251693105.839675: sendbackup: critical (fatal): error [dump (7257) /sbin/dump returned 3] Second: 1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 22:32:03 2009 1251693123.841613: sendbackup: Version 2.6.2alpha 1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:32:03 2009 1251693123.896360: sendbackup: Parsed request as: program `DUMP' 1251693123.896437: sendbackup: disk `/' 1251693123.896487: sendbackup: device `/' 1251693123.896535: sendbackup: level 0 1251693123.896584: sendbackup: since NODATE 1251693123.896632: sendbackup: options `' 1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0 1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs' 1251693123.911778: sendbackup: Spawning /sbin/dump dump 0usf 1048576 - /dev/rsd0a in pipeline 1251693123.916860: sendbackup: Started backup 1251693123.992261: sendbackup: Started index creator: /sbin/restore -tvf - 21 | sed -e ' s/^leaf[]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d ' 1251693123.995743: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Sun Aug 30 22:32:03 2009 1251693124.009013: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251693124.038977: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0a (/) to standard output 1251693124.143454: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251693126.173916: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251693126.176287: sendbackup: 90: normal(|): DUMP: estimated 35134 tape blocks. 1251693126.185677: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Sun Aug 30 22:32:06 2009 1251693126.205305: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251693126.505055: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251693126.797271: sendbackup: critical (fatal): index tee cannot write [Resource temporarily unavailable] 1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee
Re: Backup issues with OpenBSD 4.5 machines
Hi Jean-Louis, I thought I had applied the patches on this machine also, but it turns out I didn't (sorry about that). I applied the patches and ran a new dump. This time all 4 file systems succeeded, though /usr got the same strange message as on the other machine: 1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version 2.6.2alpha: start at Mon Aug 31 14:58:02 2009 1251752282.484788: sendbackup: Version 2.6.2alpha 1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version 2.6.2alpha: rename at Mon Aug 31 14:58:02 2009 1251752282.527795: sendbackup: Parsed request as: program `DUMP' 1251752282.527880: sendbackup: disk `/usr' 1251752282.527931: sendbackup: device `/usr' 1251752282.527978: sendbackup: level 0 1251752282.528025: sendbackup: since NODATE 1251752282.528072: sendbackup: options `' 1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0 1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs' 1251752282.543664: sendbackup: Spawning /sbin/dump dump 0usf 1048576 - /dev/rsd0d in pipeline 1251752282.548532: sendbackup: Started backup 1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2 1251752282.625055: sendbackup: Started index creator: /sbin/restore -tvf - 21 | sed -e ' s/^leaf[]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d ' 1251752282.627853: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Mon Aug 31 14:58:02 2009 1251752282.641040: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251752282.754862: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0d (/usr) to standard output 1251752282.792706: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251752286.777085: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251752286.779538: sendbackup: 90: normal(|): DUMP: estimated 653358 tape blocks. 1251752286.789190: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Mon Aug 31 14:58:06 2009 1251752286.827782: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251752290.646004: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource temporarily unavailable 1251752424.852658: sendbackup: 43:size(|): DUMP: 664333 tape blocks 1251752424.854957: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Mon Aug 31 14:58:02 2009 1251752424.857083: sendbackup: 90: normal(|): DUMP: Volume 1 completed at: Mon Aug 31 15:00:24 2009 1251752424.859064: sendbackup: 90: normal(|): DUMP: Volume 1 took 0:02:18 1251752424.860994: sendbackup: 90: normal(|): DUMP: Volume 1 transfer rate: 4814 KB/s 1251752424.863042: sendbackup: 90: normal(|): DUMP: Date this dump completed: Mon Aug 31 15:00:24 2009 1251752424.864993: sendbackup: 90: normal(|): DUMP: Average transfer rate: 4814 KB/s 1251752424.869138: sendbackup: 90: normal(|): DUMP: level 0 dump on Mon Aug 31 14:58:02 2009 1251752424.878026: sendbackup: 90: normal(|): DUMP: DUMP IS DONE 1251752424.888324: sendbackup: Index pipe exited with status 1 1251752424.906986: sendbackup: Parsed backup messages 1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31 15:00:24 2009 So now I've seen the same behavior that Stan noted: applying the debug patch seems to fix the dump. Also, in case you didn't see it before, here is a snippet from an email thread Stan started on an OpenBSD list: I am trying to get Amanda of a recent vintage working on 4.5. The developrs don't understand how writing to a blocking pipe can return EAGAIN. Should the write just block? Yes, it should, and a quick, about-to-go-to-bed readthrough of the pipe write path shows that it shouldn't; is Amanda doing fcntls to the fd? If so, check those to be absolutely sure that they're not doing something hinkey. Thanks, Michael On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau martin...@zmanda.comwrote: Do you have the patch I sent to stan on this system? The patch check before and after the write if the pipe is in O_NONBLOCK or not and give an error if it is. I'm totally lost since it is in blocking mode and you get EAGAIN, which is impossible Jean-louis Michael Burk wrote: Here are the two sendbackup.*.debug files for the / fs. First: 1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: start at Sun Aug 30 22:31:42 2009 1251693102.345326: sendbackup: Version 2.6.2alpha 1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009 1251693102.382860: sendbackup: Parsed request as: program `DUMP' 1251693102.382940: sendbackup: disk `/' 1251693102.382991: sendbackup: device `/'
Re: Backup issues with OpenBSD 4.5 machines
amanda doesn't do fcntl on these file descriptor. But my patch do: fcntl(fd, F_GETFL, 0) to check if it have O_NONBLOCK set, the patch doesn't change it. Can you try the attached patch, it do the same trick for the index file descriptor (remove previous patch before applying). Jean-Louis Michael Burk wrote: Hi Jean-Louis, I thought I had applied the patches on this machine also, but it turns out I didn't (sorry about that). I applied the patches and ran a new dump. This time all 4 file systems succeeded, though /usr got the same strange message as on the other machine: 1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version 2.6.2alpha: start at Mon Aug 31 14:58:02 2009 1251752282.484788: sendbackup: Version 2.6.2alpha 1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version 2.6.2alpha: rename at Mon Aug 31 14:58:02 2009 1251752282.527795: sendbackup: Parsed request as: program `DUMP' 1251752282.527880: sendbackup: disk `/usr' 1251752282.527931: sendbackup: device `/usr' 1251752282.527978: sendbackup: level 0 1251752282.528025: sendbackup: since NODATE 1251752282.528072: sendbackup: options `' 1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0 1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs' 1251752282.543664: sendbackup: Spawning /sbin/dump dump 0usf 1048576 - /dev/rsd0d in pipeline 1251752282.548532: sendbackup: Started backup 1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2 1251752282.625055: sendbackup: Started index creator: /sbin/restore -tvf - 21 | sed -e ' s/^leaf[]*[0-9]*[ ]*\.// t /^dir[ ]/ { s/^dir[ ]*[0-9]*[ ]*\.// s%$%/% t } d ' 1251752282.627853: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Mon Aug 31 14:58:02 2009 1251752282.641040: sendbackup: 90: normal(|): DUMP: Date of last level 0 dump: the epoch 1251752282.754862: sendbackup: 90: normal(|): DUMP: Dumping /dev/rsd0d (/usr) to standard output 1251752282.792706: sendbackup: 90: normal(|): DUMP: mapping (Pass I) [regular files] 1251752286.777085: sendbackup: 90: normal(|): DUMP: mapping (Pass II) [directories] 1251752286.779538: sendbackup: 90: normal(|): DUMP: estimated 653358 tape blocks. 1251752286.789190: sendbackup: 90: normal(|): DUMP: Volume 1 started at: Mon Aug 31 14:58:06 2009 1251752286.827782: sendbackup: 90: normal(|): DUMP: dumping (Pass III) [directories] 1251752290.646004: sendbackup: 90: normal(|): DUMP: dumping (Pass IV) [regular files] 1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource temporarily unavailable 1251752424.852658: sendbackup: 43:size(|): DUMP: 664333 tape blocks 1251752424.854957: sendbackup: 90: normal(|): DUMP: Date of this level 0 dump: Mon Aug 31 14:58:02 2009 1251752424.857083: sendbackup: 90: normal(|): DUMP: Volume 1 completed at: Mon Aug 31 15:00:24 2009 1251752424.859064: sendbackup: 90: normal(|): DUMP: Volume 1 took 0:02:18 1251752424.860994: sendbackup: 90: normal(|): DUMP: Volume 1 transfer rate: 4814 KB/s 1251752424.863042: sendbackup: 90: normal(|): DUMP: Date this dump completed: Mon Aug 31 15:00:24 2009 1251752424.864993: sendbackup: 90: normal(|): DUMP: Average transfer rate: 4814 KB/s 1251752424.869138: sendbackup: 90: normal(|): DUMP: level 0 dump on Mon Aug 31 14:58:02 2009 1251752424.878026: sendbackup: 90: normal(|): DUMP: DUMP IS DONE 1251752424.888324: sendbackup: Index pipe exited with status 1 1251752424.906986: sendbackup: Parsed backup messages 1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31 15:00:24 2009 So now I've seen the same behavior that Stan noted: applying the debug patch seems to fix the dump. Also, in case you didn't see it before, here is a snippet from an email thread Stan started on an OpenBSD list: I am trying to get Amanda of a recent vintage working on 4.5. The developrs don't understand how writing to a blocking pipe can return EAGAIN. Should the write just block? Yes, it should, and a quick, about-to-go-to-bed readthrough of the pipe write path shows that it shouldn't; is Amanda doing fcntls to the fd? If so, check those to be absolutely sure that they're not doing something hinkey. Thanks, Michael On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau martin...@zmanda.com mailto:martin...@zmanda.com wrote: Do you have the patch I sent to stan on this system? The patch check before and after the write if the pipe is in O_NONBLOCK or not and give an error if it is. I'm totally lost since it is in blocking mode and you get EAGAIN, which is impossible Jean-louis Michael Burk wrote: Here are the two sendbackup.*.debug files for the / fs. First: 1251693102.343436: sendbackup: pid 29087 ruid 150 euid
Re: very slow dumper (42.7KB/s)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Robinson wrote: Hi, I'm running amanda (2.6.0p2-1) but have an older client running 2.4.2p2-1. On that client the full backup of a 4GB disk takes a very long time: DUMP SUMMARY: DUMPER STATS TAPER STATS HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s -- --- host / 0 42567901819411 42.7 637:22 47.6 26:01 1165.9 I'm not sure where to start looking for this bottle-neck. Any clues would be appreciated. bump - -- Tom Robinson System Administrator MoTeC 121 Merrindale Drive Croydon South 3136 Victoria Australia T: +61 3 9761 5050 F: +61 3 9761 5051 M: +61 4 3268 7026 E: tom.robin...@motec.com.au -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org/ iD8DBQFKnFXv+brnGjUTUjARAgcaAJ9mhqo4enmr/VSkjJ9k4n6lpUsU/ACg38ZL EHyS6r23vri4I7azpfEk2BY= =6ngI -END PGP SIGNATURE-
Re: very slow dumper (42.7KB/s)
Try looking on the client while the backup is running. Could be any of a lot of things. Network problems (check for errors on the NIC and the switch port), lack of CPU to run the compression, disk I/O contention, huge numbers of files (either in aggregate or in a single directory), or possibly even impending disk failure (lots of read retries or a degraded RAID). Looking at something like 'top' during the backup should give you an idea of whether your CPU is overloaded or if you are always waiting for disk, and if there is some other process(es) running that may also be trying to do a lot of disk I/O. Your system logs should show if you are seeing disk errors, and the output of ifconfig or similar will show the error counts on the NIC. If you don't see anything obvious at first, try running your dump program (dump or tar or whatever Amanda is configured to use) with the output directed to /dev/null and see how long that takes, if that is also slow then it is not the network or Amanda. Then try it without compression to see how much that speeds things up. Frank Tom Robinson wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Robinson wrote: Hi, I'm running amanda (2.6.0p2-1) but have an older client running 2.4.2p2-1. On that client the full backup of a 4GB disk takes a very long time: DUMP SUMMARY: DUMPER STATS TAPER STATS HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s -- --- host / 0 42567901819411 42.7 637:22 47.6 26:01 1165.9 I'm not sure where to start looking for this bottle-neck. Any clues would be appreciated. bump - -- Tom Robinson System Administrator MoTeC 121 Merrindale Drive Croydon South 3136 Victoria Australia T: +61 3 9761 5050 F: +61 3 9761 5051 M: +61 4 3268 7026 E: tom.robin...@motec.com.au -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org/ iD8DBQFKnFXv+brnGjUTUjARAgcaAJ9mhqo4enmr/VSkjJ9k4n6lpUsU/ACg38ZL EHyS6r23vri4I7azpfEk2BY= =6ngI -END PGP SIGNATURE- -- Frank Smith fsm...@hoovers.com Sr. Systems Administrator Voice: 512-374-4673 Hoover's Online Fax: 512-374-4501
Re: very slow dumper (42.7KB/s)
Frank Smith wrote: Tom Robinson wrote: Hi, I'm running amanda (2.6.0p2-1) but have an older client running 2.4.2p2-1. On that client the full backup of a 4GB disk takes a very long time: DUMP SUMMARY: DUMPER STATS TAPER STATS HOSTNAME DISK L ORIG-KB OUT-KB COMP% MMM:SS KB/s MMM:SS KB/s -- --- host / 0 42567901819411 42.7 637:22 47.6 26:01 1165.9 I'm not sure where to start looking for this bottle-neck. Any clues would be appreciated. bump Try looking on the client while the backup is running. Could be any of a lot of things. Network problems (check for errors on the NIC and the switch port), lack of CPU to run the compression, disk I/O contention, huge numbers of files (either in aggregate or in a single directory), or possibly even impending disk failure (lots of read retries or a degraded RAID). Looking at something like 'top' during the backup should give you an idea of whether your CPU is overloaded or if you are always waiting for disk, and if there is some other process(es) running that may also be trying to do a lot of disk I/O. Your system logs should show if you are seeing disk errors, and the output of ifconfig or similar will show the error counts on the NIC. If you don't see anything obvious at first, try running your dump program (dump or tar or whatever Amanda is configured to use) with the output directed to /dev/null and see how long that takes, if that is also slow then it is not the network or Amanda. Then try it without compression to see how much that speeds things up. Hi, Thanks for the feedback Frank. I am running dump. After re-nicing the sendbackups and dumpers to first -1 and then -3 the load average still hovers at zero: load average: 0.00, 0.00, 0.00 Re-nicing again to 0, I looked at iostat -x and found the disk saturated (%util is frequently reaches 100 but drops quickly). The average queue size (avgqu-sz) and await are also astoundingly high: avg-cpu: %user %nice%sys %idle 0.000.000.00 100.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util hda8.67 0.00 0.00 1.33 74.67 10.6764.00 14316554.320.00 7500.00 100.00 avg-cpu: %user %nice%sys %idle 0.000.000.33 99.67 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util hda1.00 0.67 0.67 10.00 10.67 85.33 9.00 14316556.95 700.00 140.62 15.00 avg-cpu: %user %nice%sys %idle 1.670.006.00 92.33 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util hda 10.00 0.00 0.67 0.00 85.330.00 128.00 1.93 28150.00 6600.00 44.00 avg-cpu: %user %nice%sys %idle 0.000.002.00 98.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util hda 10.00 0.00 0.00 0.00 85.330.00 0.00 0.53 0.00 0.00 2.67 avg-cpu: %user %nice%sys %idle 0.330.005.33 94.33 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util hda 10.00 0.00 1.00 1.33 85.33 10.6741.14 6.90 10128.57 4271.43 99.67 More concerning is the monitoring the network with tshark on the server-side I see a lost segment followed by a flurry of Dup ACK and TCP Retransmission every eight to ten seconds: 11.641313 10.0.225.2 - 192.168.0.31 TCP 53096 11003 [ACK] Seq=558073 Ack=1 Win=5392 Len=1348 TSV=519917696 TSER=427020080 11.670185 10.0.225.2 - 192.168.0.31 TCP [TCP Previous segment lost] 53096 11003 [ACK] Seq=594469 Ack=1 Win=5392 Len=1348 TSV=519917781 TSER=427020930 11.670211 192.168.0.31 - 10.0.225.2 TCP 11003 53096 [ACK] Seq=1 Ack=559421 Win=501 Len=0 TSV=427020990 TSER=519917696 SLE=594469 SRE=595817 11.699896 10.0.225.2 - 192.168.0.31 TCP 53096 11003 [ACK] Seq=595817 Ack=1 Win=5392 Len=1348 TSV=519917781 TSER=427020930 11.699916 192.168.0.31 - 10.0.225.2 TCP [TCP Dup ACK 657#1] 11003 53096 [ACK] Seq=1 Ack=559421 Win=501 Len=0 TSV=427021020 TSER=519917696 SLE=594469 SRE=597165 11.730662 10.0.225.2 - 192.168.0.31 TCP 53096 11003 [ACK] Seq=597165 Ack=1 Win=5392 Len=1348 TSV=519917787 TSER=427020990 11.730747 192.168.0.31 - 10.0.225.2 TCP [TCP Dup ACK 657#2] 11003 53096 [ACK] Seq=1 Ack=559421 Win=501 Len=0 TSV=427021050 TSER=519917696 SLE=594469 SRE=598513 11.730716 10.0.225.2 - 192.168.0.31 TCP 53096 11003 [ACK] Seq=598513 Ack=1 Win=5392 Len=1348 TSV=519917787 TSER=427020990 11.730761 192.168.0.31 - 10.0.225.2 TCP [TCP Dup ACK 657#3] 11003 53096 [ACK] Seq=1 Ack=559421 Win=501 Len=0 TSV=427021050 TSER=519917696
Re: very slow dumper (42.7KB/s)
On Mon, Aug 31, 2009 at 11:51 PM, Tom Robinsontom.robin...@motec.com.au wrote: While the disk is reaching saturation (and recovering quickly) I'm thinking that the all the retransmissions would be slowing things down more. I don't see any errors on the client interface but there are four on the server interface over the last four days. Hmm, the causation may be going the other way -- if the disk is generating too many IRQs for the CPU to handle, then network packets might get dropped. Alternately, perhaps the PCI bus is maxed out? Anyway, this sounds like a problem local to the client. Is there a way to slow down the disk IO so that it doesn't wedge the machine? Dustin -- Open Source Storage Engineer http://www.zmanda.com