[Amanda-users] Bare metal restore problem

2009-08-31 Thread khalil_noura


Jean-Louis Martineau wrote:
 What program do you use to do the backup? dump or tar?
 
 Jean-Louis
 
 khalil_noura wrote:
 
  Hi all,
  
  
  I am trying to do a bare metal restore. After I partionsed the disk and 
  restored the client to holding. I rebooted to rescue mode created a 
  /mnt/sysimage directory mounted /dev/sda1 to /mnt/sysimage then cd 
  /mnt/sysimage.
  run this command: ssh amandaserver cat /holdingdisk/hostname._.0 | restore 
  rf -
  
  I get an error that reads tape read error on first record
  
  any ideas?
  
  Thank you,
  
  +--
  |This was sent by adarouic  at  curry.edu via Backup Central.
  |Forward SPAM to abuse  at  backupcentral.com.
  +--
  
  
  
 



I am using Dump

+--
|This was sent by adaro...@curry.edu via Backup Central.
|Forward SPAM to ab...@backupcentral.com.
+--




Snapshots, redundant ?

2009-08-31 Thread Brian Cuttler

Here is a fun thought question.

Assuming that we are running daily snapshots for the purposes
of easing file restores and providing recovery for users that
accidently remove files they still need (or corrupt spreadsheets
or whatever they do)... daily snapshots on a rolling basis.

/usr1 has snapshots /usr1-monday... /usr1-friday and we remove
and recreate one snapshot/day during the course of the week.

What if any added benifit am I seeing by using zfs-snapshots
from within the amanda run. I'm not suggesting that we don't
need backups, but the snapshot portion of the dump.

I don't think it costs me anything extra, but I don't know that
there are benifits.

thanks,

Brian
---
   Brian R Cuttler brian.cutt...@wadsworth.org
   Computer Systems Support(v) 518 486-1697
   Wadsworth Center(f) 518 473-6384
   NYS Department of HealthHelp Desk 518 473-0773



IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure.  It
is intended only for the addressee.  If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments.  Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.




hard coded limit REP_TIMEOUT of 6hrs in amandad-src/amandad.c

2009-08-31 Thread Jean-Francois Malouin
Hi,

I was hitting this hard coded limit a couple years ago running
2.5.2-20070523 and over the last month it started to show up again on
a client having a lot of big DLEs (20 or so for a total of 11TB).
Client is running 2.6.0p2 and server is at 2.6.1p1. Back then
Jean-Louis provided me with a patch that reseted the timeout after
each estimates but just perusing the source I see that amandad.c
has been extensively rewritten. Anyone cares to have a look at this?

thanks,
jf
-- 
°  Jean-François Malouin  McConnell Brain Imaging Centre
Systems/Network Administrator   Montréal Neurological Institute
3801 Rue Université, Suite WB219  Montréal, Québec, H3A 2B4
Phone: 514-398-8924   Fax: 514-398-8948


Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Jean-Louis Martineau
You get the error in the index pipe instead of the data path. The backup 
is correct but your index is empty. That's why you get a STRANGE result 
instead of a failure.


Can you post a sendbackup.*.debug for a dle that failed?

Jean-Louis

Michael Burk wrote:

Hello,

I applied the patches to the 20090827 snapshot. I tried it on two 
OpenBSD 4.5 sparc64 systems, forcing both to do full backups. One 
system seemed to work on all 3 file systems, the other failed on all 4 
file systems with the same errors as before (exactly like what Stan 
reported). I'm using bsdtcp auth.


The system that worked, however, did give a strange result in the 
report on one filesystem. Here's the corresponding sendbackup debug 
file:


1251692700.529842: sendbackup: pid 30459 ruid 150 euid 150 version 
2.6.2alpha: start at Sun Aug 30 2

2:25:00 2009
1251692700.530411: sendbackup: Version 2.6.2alpha
1251692700.553135: sendbackup: pid 30459 ruid 150 euid 150 version 
2.6.2alpha: rename at Sun Aug 30

22:25:00 2009
1251692700.555394: sendbackup:   Parsed request as: program `DUMP'
1251692700.555435: sendbackup:  disk `/usr'
1251692700.555461: sendbackup:  device `/usr'
1251692700.555484: sendbackup:  level 0
1251692700.07: sendbackup:  since NODATE
1251692700.30: sendbackup:  options `'
1251692700.556750: sendbackup: start: zirconium.example.com:/usr lev 0
1251692700.559495: sendbackup: dumping device '/dev/rsd1d' with 'ffs'
1251692700.566094: sendbackup: Spawning /sbin/dump dump 0usf 1048576 
- /dev/rsd1d in pipeline

1251692700.571185: sendbackup: Started backup
1251692700.572425: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251692700.597388: sendbackup: Started index creator: /sbin/restore 
-tvf - 21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251692700.611526: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Sun Aug 30 22:25:

00 2009
1251692700.669703: sendbackup:  90:  normal(|):   DUMP: Date of last 
level 0 dump: the epoch
1251692700.670724: sendbackup:  90:  normal(|):   DUMP: Dumping 
/dev/rsd1d (/usr) to standard output
1251692700.763600: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
I) [regular files]
1251692705.618345: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
II) [directories]
1251692705.620040: sendbackup:  90:  normal(|):   DUMP: estimated 
1330085 tape blocks.
1251692705.623976: sendbackup:  90:  normal(|):   DUMP: Volume 1 
started at: Sun Aug 30 22:25:05 2009
1251692705.650687: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
III) [directories]
1251692745.051547: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
IV) [regular files]
1251692747.915767: sendbackup: 115: strange(?): sed: stdout: Resource 
temporarily unavailable
1251692954.633083: sendbackup:  43:size(|):   DUMP: 1437829 tape 
blocks
1251692954.637414: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Sun Aug 30 22:25:00 2009
1251692954.641414: sendbackup:  90:  normal(|):   DUMP: Volume 1 
completed at: Sun Aug 30 22:29:14 2009

1251692954.641858: sendbackup: Index pipe exited with status 1
1251692954.642719: sendbackup:  90:  normal(|):   DUMP: Volume 1 took 
0:04:09
1251692954.643918: sendbackup:  90:  normal(|):   DUMP: Volume 1 
transfer rate: 5774 KB/s
1251692954.645095: sendbackup:  90:  normal(|):   DUMP: Date this dump 
completed:  Sun Aug 30 22:29:14 2009
1251692954.646239: sendbackup:  90:  normal(|):   DUMP: Average 
transfer rate: 5774 KB/s
1251692954.647430: sendbackup:  90:  normal(|):   DUMP: level 0 dump 
on Sun Aug 30 22:25:00 2009

1251692954.653480: sendbackup:  90:  normal(|):   DUMP: DUMP IS DONE
1251692954.653841: sendbackup: Parsed backup messages
1251692954.654032: sendbackup: pid 30459 finish time Sun Aug 30 
22:29:14 2009


-- Michael

On Fri, Aug 28, 2009 at 5:22 AM, stan st...@panix.com 
mailto:st...@panix.com wrote:


On Thu, Aug 27, 2009 at 03:35:47PM -0600, Michael Burk wrote:
 Cool - can you send me the patch? Or is it already in the 0826
snapshot?

 FYI - I got 2.5.0p2 working, but could never get 2.5.1p3
working, even with
 the same config. No idea why.
 I started looking at the source code last night in the 0825
snapshot. I'll
 discontinue that search if there's a patch.

This is interesting. I sent Jean-Louis Martineau an email, saying
teaks for
fixing things, and he replied that the patch he sent me just added
debugging,
it was not intended to fix anything.

Bit, using this snapshot 20090813, and the patch, which I will
send to you.
I have it working on my 3 OpenBSD 4.5 machines that I have
upgraded so far.
I ran several test yesterday, and a full fledged backup run last
night,
and I have not seen any issues since installing this code.

I am going to attach the patch to 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Michael Burk
Here are the two sendbackup.*.debug files for the / fs.

First:

1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version
2.6.2alpha: start at Sun Aug 30 22:31:42 2009
1251693102.345326: sendbackup: Version 2.6.2alpha
1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version
2.6.2alpha: rename at Sun Aug 30 22:31:42 2009
1251693102.382860: sendbackup:   Parsed request as: program `DUMP'
1251693102.382940: sendbackup:  disk `/'
1251693102.382991: sendbackup:  device `/'
1251693102.383039: sendbackup:  level 0
1251693102.383086: sendbackup:  since NODATE
1251693102.383134: sendbackup:  options `'
1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0
1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693102.398325: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd0a in pipeline
1251693102.403120: sendbackup: Started backup
1251693102.461663: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Sun Aug 30 22:31:42 2009
1251693102.468669: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1251693102.487536: sendbackup: Started index creator: /sbin/restore -tvf -
21 | sed -e '
s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251693102.519939: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0a
(/) to standard output
1251693102.588086: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1251693104.583392: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1251693104.585824: sendbackup:  90:  normal(|):   DUMP: estimated 35134 tape
blocks.
1251693104.595314: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Sun Aug 30 22:31:44 2009
1251693104.615444: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1251693104.919691: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV)
[regular files]
1251693105.800795: sendbackup: critical (fatal): index tee cannot write
[Resource temporarily unavailable]
1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee cannot
write [Resource temporarily unavailable]
1251693105.827339: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693105.832068: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is
aborted.
1251693105.839675: sendbackup: critical (fatal): error [dump (7257)
/sbin/dump returned 3]

Second:

1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version
2.6.2alpha: start at Sun Aug 30 22:32:03 2009
1251693123.841613: sendbackup: Version 2.6.2alpha
1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version
2.6.2alpha: rename at Sun Aug 30 22:32:03 2009
1251693123.896360: sendbackup:   Parsed request as: program `DUMP'
1251693123.896437: sendbackup:  disk `/'
1251693123.896487: sendbackup:  device `/'
1251693123.896535: sendbackup:  level 0
1251693123.896584: sendbackup:  since NODATE
1251693123.896632: sendbackup:  options `'
1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0
1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693123.911778: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd0a in pipeline
1251693123.916860: sendbackup: Started backup
1251693123.992261: sendbackup: Started index creator: /sbin/restore -tvf -
21 | sed -e '
s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251693123.995743: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Sun Aug 30 22:32:03 2009
1251693124.009013: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1251693124.038977: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0a
(/) to standard output
1251693124.143454: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1251693126.173916: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1251693126.176287: sendbackup:  90:  normal(|):   DUMP: estimated 35134 tape
blocks.
1251693126.185677: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Sun Aug 30 22:32:06 2009
1251693126.205305: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1251693126.505055: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV)
[regular files]
1251693126.797271: sendbackup: critical (fatal): index tee cannot write
[Resource temporarily unavailable]
1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee cannot
write [Resource temporarily unavailable]
1251693126.821488: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693126.825377: sendbackup:  90:  normal(|):   DUMP: The ENTIRE dump is
aborted.
1251693126.833573: sendbackup: critical (fatal): error [dump (22791)
/sbin/dump returned 3]

Thanks,
Michael

On Mon, Aug 31, 2009 at 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Jean-Louis Martineau

Do you have the patch I sent to stan on this system?

The patch check before and after the write if the pipe is in O_NONBLOCK 
or not and give an error if it is.


I'm totally lost since it is in blocking mode and you get EAGAIN, which 
is impossible


Jean-louis

Michael Burk wrote:

Here are the two sendbackup.*.debug files for the / fs.

First:

1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version 
2.6.2alpha: start at Sun Aug 30 22:31:42 2009

1251693102.345326: sendbackup: Version 2.6.2alpha
1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version 
2.6.2alpha: rename at Sun Aug 30 22:31:42 2009

1251693102.382860: sendbackup:   Parsed request as: program `DUMP'
1251693102.382940: sendbackup:  disk `/'
1251693102.382991: sendbackup:  device `/'
1251693102.383039: sendbackup:  level 0
1251693102.383086: sendbackup:  since NODATE
1251693102.383134: sendbackup:  options `'
1251693102.388833: sendbackup: start: bromine.example.com:/ lev 0
1251693102.391475: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693102.398325: sendbackup: Spawning /sbin/dump dump 0usf 1048576 
- /dev/rsd0a in pipeline

1251693102.403120: sendbackup: Started backup
1251693102.461663: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Sun Aug 30 22:31:42 2009
1251693102.468669: sendbackup:  90:  normal(|):   DUMP: Date of last 
level 0 dump: the epoch
1251693102.487536: sendbackup: Started index creator: /sbin/restore 
-tvf - 21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251693102.519939: sendbackup:  90:  normal(|):   DUMP: Dumping 
/dev/rsd0a (/) to standard output
1251693102.588086: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
I) [regular files]
1251693104.583392: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
II) [directories]
1251693104.585824: sendbackup:  90:  normal(|):   DUMP: estimated 
35134 tape blocks.
1251693104.595314: sendbackup:  90:  normal(|):   DUMP: Volume 1 
started at: Sun Aug 30 22:31:44 2009
1251693104.615444: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
III) [directories]
1251693104.919691: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
IV) [regular files]
1251693105.800795: sendbackup: critical (fatal): index tee cannot 
write [Resource temporarily unavailable]
1251693105.803486: sendbackup: 115: strange(?): sendbackup: index tee 
cannot write [Resource temporarily unavailable]

1251693105.827339: sendbackup:  90:  normal(|):   DUMP: Broken pipe
1251693105.832068: sendbackup:  90:  normal(|):   DUMP: The ENTIRE 
dump is aborted.
1251693105.839675: sendbackup: critical (fatal): error [dump (7257) 
/sbin/dump returned 3]


Second:

1251693123.840560: sendbackup: pid 32144 ruid 150 euid 150 version 
2.6.2alpha: start at Sun Aug 30 22:32:03 2009

1251693123.841613: sendbackup: Version 2.6.2alpha
1251693123.892773: sendbackup: pid 32144 ruid 150 euid 150 version 
2.6.2alpha: rename at Sun Aug 30 22:32:03 2009

1251693123.896360: sendbackup:   Parsed request as: program `DUMP'
1251693123.896437: sendbackup:  disk `/'
1251693123.896487: sendbackup:  device `/'
1251693123.896535: sendbackup:  level 0
1251693123.896584: sendbackup:  since NODATE
1251693123.896632: sendbackup:  options `'
1251693123.902332: sendbackup: start: bromine.example.com:/ lev 0
1251693123.905012: sendbackup: dumping device '/dev/rsd0a' with 'ffs'
1251693123.911778: sendbackup: Spawning /sbin/dump dump 0usf 1048576 
- /dev/rsd0a in pipeline

1251693123.916860: sendbackup: Started backup
1251693123.992261: sendbackup: Started index creator: /sbin/restore 
-tvf - 21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251693123.995743: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Sun Aug 30 22:32:03 2009
1251693124.009013: sendbackup:  90:  normal(|):   DUMP: Date of last 
level 0 dump: the epoch
1251693124.038977: sendbackup:  90:  normal(|):   DUMP: Dumping 
/dev/rsd0a (/) to standard output
1251693124.143454: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
I) [regular files]
1251693126.173916: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
II) [directories]
1251693126.176287: sendbackup:  90:  normal(|):   DUMP: estimated 
35134 tape blocks.
1251693126.185677: sendbackup:  90:  normal(|):   DUMP: Volume 1 
started at: Sun Aug 30 22:32:06 2009
1251693126.205305: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
III) [directories]
1251693126.505055: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
IV) [regular files]
1251693126.797271: sendbackup: critical (fatal): index tee cannot 
write [Resource temporarily unavailable]
1251693126.799929: sendbackup: 115: strange(?): sendbackup: index tee 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Michael Burk
Hi Jean-Louis,

I thought I had applied the patches on this machine also, but it turns out I
didn't (sorry about that). I applied the patches and ran a new dump. This
time all 4 file systems succeeded, though /usr got the same strange
message as on the other machine:

1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version
2.6.2alpha: start at Mon Aug 31 14:58:02 2009
1251752282.484788: sendbackup: Version 2.6.2alpha
1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version
2.6.2alpha: rename at Mon Aug 31 14:58:02 2009
1251752282.527795: sendbackup:   Parsed request as: program `DUMP'
1251752282.527880: sendbackup:  disk `/usr'
1251752282.527931: sendbackup:  device `/usr'
1251752282.527978: sendbackup:  level 0
1251752282.528025: sendbackup:  since NODATE
1251752282.528072: sendbackup:  options `'
1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0
1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs'
1251752282.543664: sendbackup: Spawning /sbin/dump dump 0usf 1048576 -
/dev/rsd0d in pipeline
1251752282.548532: sendbackup: Started backup
1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251752282.625055: sendbackup: Started index creator: /sbin/restore -tvf -
21 | sed -e '
s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251752282.627853: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Mon Aug 31 14:58:02 2009
1251752282.641040: sendbackup:  90:  normal(|):   DUMP: Date of last level 0
dump: the epoch
1251752282.754862: sendbackup:  90:  normal(|):   DUMP: Dumping /dev/rsd0d
(/usr) to standard output
1251752282.792706: sendbackup:  90:  normal(|):   DUMP: mapping (Pass I)
[regular files]
1251752286.777085: sendbackup:  90:  normal(|):   DUMP: mapping (Pass II)
[directories]
1251752286.779538: sendbackup:  90:  normal(|):   DUMP: estimated 653358
tape blocks.
1251752286.789190: sendbackup:  90:  normal(|):   DUMP: Volume 1 started at:
Mon Aug 31 14:58:06 2009
1251752286.827782: sendbackup:  90:  normal(|):   DUMP: dumping (Pass III)
[directories]
1251752290.646004: sendbackup:  90:  normal(|):   DUMP: dumping (Pass IV)
[regular files]
1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource
temporarily unavailable
1251752424.852658: sendbackup:  43:size(|):   DUMP: 664333 tape blocks
1251752424.854957: sendbackup:  90:  normal(|):   DUMP: Date of this level 0
dump: Mon Aug 31 14:58:02 2009
1251752424.857083: sendbackup:  90:  normal(|):   DUMP: Volume 1 completed
at: Mon Aug 31 15:00:24 2009
1251752424.859064: sendbackup:  90:  normal(|):   DUMP: Volume 1 took
0:02:18
1251752424.860994: sendbackup:  90:  normal(|):   DUMP: Volume 1 transfer
rate: 4814 KB/s
1251752424.863042: sendbackup:  90:  normal(|):   DUMP: Date this dump
completed:  Mon Aug 31 15:00:24 2009
1251752424.864993: sendbackup:  90:  normal(|):   DUMP: Average transfer
rate: 4814 KB/s
1251752424.869138: sendbackup:  90:  normal(|):   DUMP: level 0 dump on Mon
Aug 31 14:58:02 2009
1251752424.878026: sendbackup:  90:  normal(|):   DUMP: DUMP IS DONE
1251752424.888324: sendbackup: Index pipe exited with status 1
1251752424.906986: sendbackup: Parsed backup messages
1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31 15:00:24
2009

So now I've seen the same behavior that Stan noted: applying the debug patch
seems to fix the dump.

Also, in case you didn't see it before, here is a snippet from an email
thread Stan started on an OpenBSD list:
   I am trying to get Amanda of a recent vintage working on 4.5. The
developrs
   don't understand how writing to a blocking pipe can return EAGAIN.
   Should the write just block?

 Yes, it should, and a quick, about-to-go-to-bed readthrough of the
 pipe write path shows that it shouldn't; is Amanda doing fcntls to
 the fd? If so, check those to be absolutely sure that they're not
 doing something hinkey.

Thanks,
Michael

On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau
martin...@zmanda.comwrote:

 Do you have the patch I sent to stan on this system?

 The patch check before and after the write if the pipe is in O_NONBLOCK or
 not and give an error if it is.

 I'm totally lost since it is in blocking mode and you get EAGAIN, which is
 impossible

 Jean-louis

 Michael Burk wrote:

 Here are the two sendbackup.*.debug files for the / fs.

 First:

 1251693102.343436: sendbackup: pid 29087 ruid 150 euid 150 version
 2.6.2alpha: start at Sun Aug 30 22:31:42 2009
 1251693102.345326: sendbackup: Version 2.6.2alpha
 1251693102.379331: sendbackup: pid 29087 ruid 150 euid 150 version
 2.6.2alpha: rename at Sun Aug 30 22:31:42 2009
 1251693102.382860: sendbackup:   Parsed request as: program `DUMP'
 1251693102.382940: sendbackup:  disk `/'
 1251693102.382991: sendbackup:  device `/'
 

Re: Backup issues with OpenBSD 4.5 machines

2009-08-31 Thread Jean-Louis Martineau

amanda doesn't do fcntl on these file descriptor.
But my patch do: fcntl(fd, F_GETFL, 0) to check if it have O_NONBLOCK 
set, the patch doesn't change it.
Can you try the attached patch, it do the same trick for the index file 
descriptor (remove previous patch before applying).


Jean-Louis

Michael Burk wrote:

Hi Jean-Louis,

I thought I had applied the patches on this machine also, but it turns 
out I didn't (sorry about that). I applied the patches and ran a new 
dump. This time all 4 file systems succeeded, though /usr got the same 
strange message as on the other machine:


1251752282.483677: sendbackup: pid 15574 ruid 150 euid 150 version 
2.6.2alpha: start at Mon Aug 31 14:58:02 2009

1251752282.484788: sendbackup: Version 2.6.2alpha
1251752282.523952: sendbackup: pid 15574 ruid 150 euid 150 version 
2.6.2alpha: rename at Mon Aug 31 14:58:02 2009

1251752282.527795: sendbackup:   Parsed request as: program `DUMP'
1251752282.527880: sendbackup:  disk `/usr'
1251752282.527931: sendbackup:  device `/usr'
1251752282.527978: sendbackup:  level 0
1251752282.528025: sendbackup:  since NODATE
1251752282.528072: sendbackup:  options `'
1251752282.533704: sendbackup: start: bromine.example.com:/usr lev 0
1251752282.537190: sendbackup: dumping device '/dev/rsd0d' with 'ffs'
1251752282.543664: sendbackup: Spawning /sbin/dump dump 0usf 1048576 
- /dev/rsd0d in pipeline

1251752282.548532: sendbackup: Started backup
1251752282.561558: sendbackup: fd 3 is set with O_NONBLOCK before dup2
1251752282.625055: sendbackup: Started index creator: /sbin/restore 
-tvf - 21 | sed -e '

s/^leaf[]*[0-9]*[   ]*\.//
t
/^dir[  ]/ {
s/^dir[ ]*[0-9]*[   ]*\.//
s%$%/%
t
}
d
'
1251752282.627853: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Mon Aug 31 14:58:02 2009
1251752282.641040: sendbackup:  90:  normal(|):   DUMP: Date of last 
level 0 dump: the epoch
1251752282.754862: sendbackup:  90:  normal(|):   DUMP: Dumping 
/dev/rsd0d (/usr) to standard output
1251752282.792706: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
I) [regular files]
1251752286.777085: sendbackup:  90:  normal(|):   DUMP: mapping (Pass 
II) [directories]
1251752286.779538: sendbackup:  90:  normal(|):   DUMP: estimated 
653358 tape blocks.
1251752286.789190: sendbackup:  90:  normal(|):   DUMP: Volume 1 
started at: Mon Aug 31 14:58:06 2009
1251752286.827782: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
III) [directories]
1251752290.646004: sendbackup:  90:  normal(|):   DUMP: dumping (Pass 
IV) [regular files]
1251752290.822019: sendbackup: 115: strange(?): sed: stdout: Resource 
temporarily unavailable

1251752424.852658: sendbackup:  43:size(|):   DUMP: 664333 tape blocks
1251752424.854957: sendbackup:  90:  normal(|):   DUMP: Date of this 
level 0 dump: Mon Aug 31 14:58:02 2009
1251752424.857083: sendbackup:  90:  normal(|):   DUMP: Volume 1 
completed at: Mon Aug 31 15:00:24 2009
1251752424.859064: sendbackup:  90:  normal(|):   DUMP: Volume 1 took 
0:02:18
1251752424.860994: sendbackup:  90:  normal(|):   DUMP: Volume 1 
transfer rate: 4814 KB/s
1251752424.863042: sendbackup:  90:  normal(|):   DUMP: Date this dump 
completed:  Mon Aug 31 15:00:24 2009
1251752424.864993: sendbackup:  90:  normal(|):   DUMP: Average 
transfer rate: 4814 KB/s
1251752424.869138: sendbackup:  90:  normal(|):   DUMP: level 0 dump 
on Mon Aug 31 14:58:02 2009

1251752424.878026: sendbackup:  90:  normal(|):   DUMP: DUMP IS DONE
1251752424.888324: sendbackup: Index pipe exited with status 1
1251752424.906986: sendbackup: Parsed backup messages
1251752424.907361: sendbackup: pid 15574 finish time Mon Aug 31 
15:00:24 2009


So now I've seen the same behavior that Stan noted: applying the debug 
patch seems to fix the dump.


Also, in case you didn't see it before, here is a snippet from an 
email thread Stan started on an OpenBSD list:
   I am trying to get Amanda of a recent vintage working on 4.5. The 
developrs

   don't understand how writing to a blocking pipe can return EAGAIN.
   Should the write just block?

 Yes, it should, and a quick, about-to-go-to-bed readthrough of the
 pipe write path shows that it shouldn't; is Amanda doing fcntls to
 the fd? If so, check those to be absolutely sure that they're not
 doing something hinkey.

Thanks,
Michael

On Mon, Aug 31, 2009 at 2:41 PM, Jean-Louis Martineau 
martin...@zmanda.com mailto:martin...@zmanda.com wrote:


Do you have the patch I sent to stan on this system?

The patch check before and after the write if the pipe is in
O_NONBLOCK or not and give an error if it is.

I'm totally lost since it is in blocking mode and you get EAGAIN,
which is impossible

Jean-louis

Michael Burk wrote:

Here are the two sendbackup.*.debug files for the / fs.

First:

1251693102.343436: sendbackup: pid 29087 ruid 150 euid 

Re: very slow dumper (42.7KB/s)

2009-08-31 Thread Tom Robinson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Robinson wrote:
 Hi,

 I'm running amanda (2.6.0p2-1) but have an older client running
 2.4.2p2-1. On that client the full backup of a 4GB disk takes a very
 long time:

 DUMP SUMMARY:
DUMPER
 STATS   TAPER STATS
 HOSTNAME DISK  L   ORIG-KB OUT-KB  COMP%
 MMM:SS   KB/s MMM:SS KB/s
 
 -- ---
 host / 0   42567901819411   42.7
 637:22   47.6  26:01   1165.9

 I'm not sure where to start looking for this bottle-neck.

 Any clues would be appreciated.
bump

- --

Tom Robinson
System Administrator

MoTeC

121 Merrindale Drive
Croydon South
3136 Victoria
Australia

T: +61 3 9761 5050
F: +61 3 9761 5051
M: +61 4 3268 7026
E: tom.robin...@motec.com.au
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org/

iD8DBQFKnFXv+brnGjUTUjARAgcaAJ9mhqo4enmr/VSkjJ9k4n6lpUsU/ACg38ZL
EHyS6r23vri4I7azpfEk2BY=
=6ngI
-END PGP SIGNATURE-



Re: very slow dumper (42.7KB/s)

2009-08-31 Thread Frank Smith
Try looking on the client while the backup is running. Could be
any of a lot of things.  Network problems (check for errors on
the NIC and the switch port), lack of CPU to run the compression,
disk I/O contention, huge numbers of files (either in aggregate
or in a single directory), or possibly even impending disk failure
(lots of read retries or a degraded RAID).
   Looking at something like 'top' during the backup should give
you an idea of whether your CPU is overloaded or if you are always
waiting for disk, and if there is some other process(es) running
that may also be trying to do a lot of disk I/O.  Your system logs
should show if you are seeing disk errors, and the output of ifconfig
or similar will show the error counts on the NIC.
   If you don't see anything obvious at first, try running your
dump program (dump or tar or whatever Amanda is configured to use)
with the output directed to /dev/null and see how long that takes,
if that is also slow then it is not the network or Amanda. Then
try it without compression to see how much that speeds things up.


Frank

Tom Robinson wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Tom Robinson wrote:
 Hi,

 I'm running amanda (2.6.0p2-1) but have an older client running
 2.4.2p2-1. On that client the full backup of a 4GB disk takes a very
 long time:

 DUMP SUMMARY:
DUMPER
 STATS   TAPER STATS
 HOSTNAME DISK  L   ORIG-KB OUT-KB  COMP%
 MMM:SS   KB/s MMM:SS KB/s
 
 -- ---
 host / 0   42567901819411   42.7
 637:22   47.6  26:01   1165.9

 I'm not sure where to start looking for this bottle-neck.

 Any clues would be appreciated.
 bump
 
 - --
 
 Tom Robinson
 System Administrator
 
 MoTeC
 
 121 Merrindale Drive
 Croydon South
 3136 Victoria
 Australia
 
 T: +61 3 9761 5050
 F: +61 3 9761 5051
 M: +61 4 3268 7026
 E: tom.robin...@motec.com.au
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.5 (GNU/Linux)
 Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org/
 
 iD8DBQFKnFXv+brnGjUTUjARAgcaAJ9mhqo4enmr/VSkjJ9k4n6lpUsU/ACg38ZL
 EHyS6r23vri4I7azpfEk2BY=
 =6ngI
 -END PGP SIGNATURE-
 


-- 
Frank Smith  fsm...@hoovers.com
Sr. Systems Administrator   Voice: 512-374-4673
Hoover's Online   Fax: 512-374-4501


Re: very slow dumper (42.7KB/s)

2009-08-31 Thread Tom Robinson
Frank Smith wrote:
 Tom Robinson wrote:
  Hi,
 
  I'm running amanda (2.6.0p2-1) but have an older client running
  2.4.2p2-1. On that client the full backup of a 4GB disk takes a very
  long time:
 
  DUMP SUMMARY:
 DUMPER
  STATS   TAPER STATS
  HOSTNAME DISK  L   ORIG-KB OUT-KB  COMP%
  MMM:SS   KB/s MMM:SS KB/s
  
  -- ---
  host / 0   42567901819411   42.7
  637:22   47.6  26:01   1165.9
 
  I'm not sure where to start looking for this bottle-neck.
 
  Any clues would be appreciated.
 bump

 Try looking on the client while the backup is running. Could be
 any of a lot of things.  Network problems (check for errors on
 the NIC and the switch port), lack of CPU to run the compression,
 disk I/O contention, huge numbers of files (either in aggregate
 or in a single directory), or possibly even impending disk failure
 (lots of read retries or a degraded RAID).
Looking at something like 'top' during the backup should give
 you an idea of whether your CPU is overloaded or if you are always
 waiting for disk, and if there is some other process(es) running
 that may also be trying to do a lot of disk I/O.  Your system logs
 should show if you are seeing disk errors, and the output of ifconfig
 or similar will show the error counts on the NIC.
If you don't see anything obvious at first, try running your
 dump program (dump or tar or whatever Amanda is configured to use)
 with the output directed to /dev/null and see how long that takes,
 if that is also slow then it is not the network or Amanda. Then
 try it without compression to see how much that speeds things up.

Hi,

Thanks for the feedback Frank. I am running dump.

After re-nicing the sendbackups and dumpers to first -1 and then -3 the
load average still hovers at zero:

load average: 0.00, 0.00, 0.00

Re-nicing again to 0, I looked at iostat -x and found the disk saturated
(%util is frequently reaches 100 but drops quickly). The average queue
size (avgqu-sz) and await are also astoundingly high:

avg-cpu:  %user   %nice%sys   %idle
   0.000.000.00  100.00
Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz avgqu-sz  
await  svctm  %util
hda8.67   0.00  0.00  1.33   74.67   10.6764.00
14316554.320.00 7500.00 100.00

avg-cpu:  %user   %nice%sys   %idle
   0.000.000.33   99.67
Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz avgqu-sz  
await  svctm  %util
hda1.00   0.67  0.67 10.00   10.67   85.33 9.00 14316556.95 
700.00 140.62  15.00

avg-cpu:  %user   %nice%sys   %idle
   1.670.006.00   92.33
Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz avgqu-sz  
await  svctm  %util
hda   10.00   0.00  0.67  0.00   85.330.00   128.00 1.93
28150.00 6600.00  44.00

avg-cpu:  %user   %nice%sys   %idle
   0.000.002.00   98.00
Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz avgqu-sz  
await  svctm  %util
hda   10.00   0.00  0.00  0.00   85.330.00 0.00 0.53   
0.00   0.00   2.67

avg-cpu:  %user   %nice%sys   %idle
   0.330.005.33   94.33
Device:  rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s avgrq-sz avgqu-sz  
await  svctm  %util
hda   10.00   0.00  1.00  1.33   85.33   10.6741.14 6.90
10128.57 4271.43  99.67

More concerning is the monitoring the network with tshark on the
server-side I see a lost segment followed by a flurry of Dup ACK and
TCP Retransmission every eight to ten seconds:

 11.641313   10.0.225.2 - 192.168.0.31 TCP 53096  11003 [ACK]
Seq=558073 Ack=1 Win=5392 Len=1348 TSV=519917696 TSER=427020080
 11.670185   10.0.225.2 - 192.168.0.31 TCP [TCP Previous segment lost]
53096  11003 [ACK] Seq=594469 Ack=1 Win=5392 Len=1348 TSV=519917781
TSER=427020930
 11.670211 192.168.0.31 - 10.0.225.2   TCP 11003  53096 [ACK] Seq=1
Ack=559421 Win=501 Len=0 TSV=427020990 TSER=519917696 SLE=594469 SRE=595817
 11.699896   10.0.225.2 - 192.168.0.31 TCP 53096  11003 [ACK]
Seq=595817 Ack=1 Win=5392 Len=1348 TSV=519917781 TSER=427020930
 11.699916 192.168.0.31 - 10.0.225.2   TCP [TCP Dup ACK 657#1] 11003 
53096 [ACK] Seq=1 Ack=559421 Win=501 Len=0 TSV=427021020 TSER=519917696
SLE=594469 SRE=597165
 11.730662   10.0.225.2 - 192.168.0.31 TCP 53096  11003 [ACK]
Seq=597165 Ack=1 Win=5392 Len=1348 TSV=519917787 TSER=427020990
 11.730747 192.168.0.31 - 10.0.225.2   TCP [TCP Dup ACK 657#2] 11003 
53096 [ACK] Seq=1 Ack=559421 Win=501 Len=0 TSV=427021050 TSER=519917696
SLE=594469 SRE=598513
 11.730716   10.0.225.2 - 192.168.0.31 TCP 53096  11003 [ACK]
Seq=598513 Ack=1 Win=5392 Len=1348 TSV=519917787 TSER=427020990
 11.730761 192.168.0.31 - 10.0.225.2   TCP [TCP Dup ACK 657#3] 11003 
53096 [ACK] Seq=1 Ack=559421 Win=501 Len=0 TSV=427021050 TSER=519917696

Re: very slow dumper (42.7KB/s)

2009-08-31 Thread Dustin J. Mitchell
On Mon, Aug 31, 2009 at 11:51 PM, Tom Robinsontom.robin...@motec.com.au wrote:
 While the disk is reaching saturation (and recovering quickly) I'm
 thinking that the all the retransmissions would be slowing things down more.

 I don't see any errors on the client interface but there are four on the
 server interface over the last four days.

Hmm, the causation may be going the other way -- if the disk is
generating too many IRQs for the CPU to handle, then network packets
might get dropped.  Alternately, perhaps the PCI bus is maxed out?
Anyway, this sounds like a problem local to the client.  Is there a
way to slow down the disk IO so that it doesn't wedge the machine?

Dustin

-- 
Open Source Storage Engineer
http://www.zmanda.com