Re: amanda backup hanging consistently

2022-06-22 Thread Orion Poplawski

On 6/17/22 04:25, Jose M Calhariz wrote:

Hi,


If you run tar by hand on the DLE does it finish?  After how much time?


Thanks for the suggestion.  That made me notice that it was bogging down 
dealing with a *very* deep directory that had been created by a buggy 
process going into infinite recursion.  Removing that appears to have 
fixed the dump.


Still seems like there is an issue with amanda not cleaning up properly 
from the timeout condition though.




On Wed, Jun 15, 2022 at 09:50:11AM -0600, Orion Poplawski wrote:

On 6/15/22 09:38, Orion Poplawski wrote:

Recently the backup of a particular DLE has started hanging consistently.  I
don't see any recent changes on the server except possibly a change from
kernel 3.10.0-1160.62.1.el7 to 3.10.0-1160.66.1.el7 on Jun 2nd, but the
problem didn't start until June 6th.

The tar command is stuck trying to write out the name of the directory to 
amgtar:

root 11321 11315 33 Jun14 ?03:55:46 /usr/bin/tar --create
--verbose --verbose --block-number --file - --directory /export/backup/rufous
--no-check-device --listed-incremental
/var/lib/amanda/gnutar-lists/saga_export_backup_rufous_0.new
--ignore-failed-read --totals --exclude-from
/var/log/amanda/amgtar._export_backup_rufous.20220614211711.exclude .
root 11432 11427 42 Jun14 ?03:57:48 /usr/bin/tar --create
--verbose --verbose --block-number --file - --directory /export/backup/rufous
--no-check-device --listed-incremental
/var/lib/amanda/gnutar-lists/saga_export_backup_rufous_0.new
--ignore-failed-read --totals --exclude-from
/var/log/amanda/amgtar._export_backup_rufous.20220614234714.exclude .

# strace -f -s 500 -p 11321
strace: Process 11321 attached
write(2, " ./export/home/orion-admin/ansible-boulder/.git/objects/db/", 59

# strace -f -s 500 -p 11432
strace: Process 11432 attached
write(2, " ./export/home/orion-admin/ansible-boulder/.git/objects/db/", 59

amgtar is stuck writing output to amandad:

# strace -s 200 -p 11427
strace: Process 11427 attached
write(4, "e_pylibssh.libs/\ndrwxr-x--- orion-admin/orion-admin 0
2022-05-23 17:31
./export/home/orion-admin/.local/lib/python3.8/site-packages/bcrypt/\ndrwxr-x---
orion-admin/orion-admin 0 2022-05-23 17:3"..., 4096


amgtar debug has:

Tue Jun 14 23:47:14.104627682 2022: pid 11427: ??-amgtar   : pid 11427 ruid 0
euid 27120 ppid 11418 version 3.6.0: start at Tue Jun 14 23:47:14 2022
Tue Jun 14 23:47:14.104731978 2022: pid 11427: ??-amgtar   : version 3.6.0
Tue Jun 14 23:47:14.104957990 2022: pid 11427: ??-amgtar   : state_stream: 156
Tue Jun 14 23:47:14.110961812 2022: pid 11427: ??-amgtar   : pid 11427 ruid 0
euid 27120 ppid 11418 version 3.6.0: rename at Tue Jun 14 23:47:14 2022
Tue Jun 14 23:47:14.110991922 2022: pid 11427: ??-amgtar   : GNUTAR-PATH 
/bin/tar
Tue Jun 14 23:47:14.111004028 2022: pid 11427: ??-amgtar   : GNUTAR-LISTDIR
/var/lib/amanda/gnutar-lists
Tue Jun 14 23:47:14.111013955 2022: pid 11427: ??-amgtar   : ONE-FILE-SYSTEM no
Tue Jun 14 23:47:14.111023583 2022: pid 11427: ??-amgtar   : SPARSE yes
Tue Jun 14 23:47:14.111033039 2022: pid 11427: ??-amgtar   : NO-UNQUOTE no
Tue Jun 14 23:47:14.111042417 2022: pid 11427: ??-amgtar   : ATIME-PRESERVE no
Tue Jun 14 23:47:14.111051865 2022: pid 11427: ??-amgtar   : ACLS no
Tue Jun 14 23:47:14.111061481 2022: pid 11427: ??-amgtar   : SELINUX no
Tue Jun 14 23:47:14.21121 2022: pid 11427: ??-amgtar   : XATTRS no
Tue Jun 14 23:47:14.30900 2022: pid 11427: ??-amgtar   : CHECK-DEVICE no
Tue Jun 14 23:47:14.40474 2022: pid 11427: ??-amgtar   : SIZE ^ *Total
bytes written: [0-9][0-9]*
Tue Jun 14 23:47:14.50269 2022: pid 11427: ??-amgtar   : IGNORE :
Directory is new$
Tue Jun 14 23:47:14.59886 2022: pid 11427: ??-amgtar   : IGNORE :
Directory has been renamed
Tue Jun 14 23:47:14.69511 2022: pid 11427: ??-amgtar   : NORMAL ^could not
open conf file
Tue Jun 14 23:47:14.79212 2022: pid 11427: ??-amgtar   : NORMAL ^Elapsed 
time:
Tue Jun 14 23:47:14.111309287 2022: pid 11427: ??-amgtar   : NORMAL ^Throughput
Tue Jun 14 23:47:14.111321160 2022: pid 11427: ??-amgtar   : NORMAL :
directory is on a different filesystem; not dumped
Tue Jun 14 23:47:14.111331690 2022: pid 11427: ??-amgtar   : NORMAL : socket
ignored$
Tue Jun 14 23:47:14.111342121 2022: pid 11427: ??-amgtar   : NORMAL : File .*
shrunk by [0-9][0-9]* bytes, padding with zeros
Tue Jun 14 23:47:14.111397543 2022: pid 11427: ??-amgtar   : NORMAL : Cannot
add file .*: No such file or directory$
Tue Jun 14 23:47:14.111407572 2022: pid 11427: ??-amgtar   : NORMAL : Error
exit delayed from previous errors
Tue Jun 14 23:47:14.111417962 2022: pid 11427: ??-amgtar   : ERROR amgtar: error
Tue Jun 14 23:47:14.114473250 2022: pid 11427: ??-amgtar   : Spawning
"/bin/tar /bin/tar -x --no-check-device -f -" in pipeline
Tue Jun 14 23:47:14.124455374 2022: pid 11431: ??-amgtar   : Executing:
/bin/tar '--version'
Tue Jun 14 23:47:14.126144343 2022: pid 11427: ??-amgtar   : Spawning
"/usr/bin/tar /usr/bin/t

Re: amanda backup hanging consistently

2022-06-17 Thread Jose M Calhariz
Hi,


If you run tar by hand on the DLE does it finish?  After how much time?

Kind regards
Jose M Calhariz


On Wed, Jun 15, 2022 at 09:50:11AM -0600, Orion Poplawski wrote:
> On 6/15/22 09:38, Orion Poplawski wrote:
> > Recently the backup of a particular DLE has started hanging consistently.  I
> > don't see any recent changes on the server except possibly a change from
> > kernel 3.10.0-1160.62.1.el7 to 3.10.0-1160.66.1.el7 on Jun 2nd, but the
> > problem didn't start until June 6th.
> > 
> > The tar command is stuck trying to write out the name of the directory to 
> > amgtar:
> > 
> > root 11321 11315 33 Jun14 ?03:55:46 /usr/bin/tar --create
> > --verbose --verbose --block-number --file - --directory 
> > /export/backup/rufous
> > --no-check-device --listed-incremental
> > /var/lib/amanda/gnutar-lists/saga_export_backup_rufous_0.new
> > --ignore-failed-read --totals --exclude-from
> > /var/log/amanda/amgtar._export_backup_rufous.20220614211711.exclude .
> > root 11432 11427 42 Jun14 ?03:57:48 /usr/bin/tar --create
> > --verbose --verbose --block-number --file - --directory 
> > /export/backup/rufous
> > --no-check-device --listed-incremental
> > /var/lib/amanda/gnutar-lists/saga_export_backup_rufous_0.new
> > --ignore-failed-read --totals --exclude-from
> > /var/log/amanda/amgtar._export_backup_rufous.20220614234714.exclude .
> > 
> > # strace -f -s 500 -p 11321
> > strace: Process 11321 attached
> > write(2, " ./export/home/orion-admin/ansible-boulder/.git/objects/db/", 59
> > 
> > # strace -f -s 500 -p 11432
> > strace: Process 11432 attached
> > write(2, " ./export/home/orion-admin/ansible-boulder/.git/objects/db/", 59
> > 
> > amgtar is stuck writing output to amandad:
> > 
> > # strace -s 200 -p 11427
> > strace: Process 11427 attached
> > write(4, "e_pylibssh.libs/\ndrwxr-x--- orion-admin/orion-admin 0
> > 2022-05-23 17:31
> > ./export/home/orion-admin/.local/lib/python3.8/site-packages/bcrypt/\ndrwxr-x---
> > orion-admin/orion-admin 0 2022-05-23 17:3"..., 4096
> > 
> > 
> > amgtar debug has:
> > 
> > Tue Jun 14 23:47:14.104627682 2022: pid 11427: ??-amgtar   : pid 11427 ruid > > 0
> > euid 27120 ppid 11418 version 3.6.0: start at Tue Jun 14 23:47:14 2022
> > Tue Jun 14 23:47:14.104731978 2022: pid 11427: ??-amgtar   : version 3.6.0
> > Tue Jun 14 23:47:14.104957990 2022: pid 11427: ??-amgtar   : state_stream: 
> > 156
> > Tue Jun 14 23:47:14.110961812 2022: pid 11427: ??-amgtar   : pid 11427 ruid > > 0
> > euid 27120 ppid 11418 version 3.6.0: rename at Tue Jun 14 23:47:14 2022
> > Tue Jun 14 23:47:14.110991922 2022: pid 11427: ??-amgtar   : GNUTAR-PATH 
> > /bin/tar
> > Tue Jun 14 23:47:14.111004028 2022: pid 11427: ??-amgtar   : GNUTAR-LISTDIR
> > /var/lib/amanda/gnutar-lists
> > Tue Jun 14 23:47:14.111013955 2022: pid 11427: ??-amgtar   : 
> > ONE-FILE-SYSTEM no
> > Tue Jun 14 23:47:14.111023583 2022: pid 11427: ??-amgtar   : SPARSE yes
> > Tue Jun 14 23:47:14.111033039 2022: pid 11427: ??-amgtar   : NO-UNQUOTE no
> > Tue Jun 14 23:47:14.111042417 2022: pid 11427: ??-amgtar   : ATIME-PRESERVE 
> > no
> > Tue Jun 14 23:47:14.111051865 2022: pid 11427: ??-amgtar   : ACLS no
> > Tue Jun 14 23:47:14.111061481 2022: pid 11427: ??-amgtar   : SELINUX no
> > Tue Jun 14 23:47:14.21121 2022: pid 11427: ??-amgtar   : XATTRS no
> > Tue Jun 14 23:47:14.30900 2022: pid 11427: ??-amgtar   : CHECK-DEVICE no
> > Tue Jun 14 23:47:14.40474 2022: pid 11427: ??-amgtar   : SIZE ^ *Total
> > bytes written: [0-9][0-9]*
> > Tue Jun 14 23:47:14.50269 2022: pid 11427: ??-amgtar   : IGNORE :
> > Directory is new$
> > Tue Jun 14 23:47:14.59886 2022: pid 11427: ??-amgtar   : IGNORE :
> > Directory has been renamed
> > Tue Jun 14 23:47:14.69511 2022: pid 11427: ??-amgtar   : NORMAL ^could 
> > not
> > open conf file
> > Tue Jun 14 23:47:14.79212 2022: pid 11427: ??-amgtar   : NORMAL 
> > ^Elapsed time:
> > Tue Jun 14 23:47:14.111309287 2022: pid 11427: ??-amgtar   : NORMAL 
> > ^Throughput
> > Tue Jun 14 23:47:14.111321160 2022: pid 11427: ??-amgtar   : NORMAL :
> > directory is on a different filesystem; not dumped
> > Tue Jun 14 23:47:14.111331690 2022: pid 11427: ??-amgtar   : NORMAL : socket
> > ignored$
> > Tue Jun 14 23:47:14.111342121 2022: pid 11427: ??-amgtar   : NORMAL : File 
> > .*
> > shrunk by [0-9][0-9]* bytes, padding with zeros
> > Tue Jun 14 23:47:14.111397543 2022: pid 11427: ??-amgtar   : NORMAL : Cannot
> > add file .*: No such file or directory$
> > Tue Jun 14 23:47:14.111407572 2022: pid 11427: ??-amgtar   : NORMAL : Error
> > exit delayed from previous errors
> > Tue Jun 14 23:47:14.111417962 2022: pid 11427: ??-amgtar   : ERROR amgtar: 
> > error
> > Tue Jun 14 23:47:14.114473250 2022: pid 11427: ??-amgtar   : Spawning
> > "/bin/tar /bin/tar -x --no-check-device -f -" in pipeline
> > Tue Jun 14 23:47:14.124455374 2022: pid 11431: ??-amgtar   : Executing:
> > /bin/tar '--version'
> > Tue Jun 14 23:47:14.126144343 2022: pid 11427: ??-amgtar

Re: amanda backup hanging consistently

2022-06-15 Thread Orion Poplawski
On 6/15/22 09:38, Orion Poplawski wrote:
> Recently the backup of a particular DLE has started hanging consistently.  I
> don't see any recent changes on the server except possibly a change from
> kernel 3.10.0-1160.62.1.el7 to 3.10.0-1160.66.1.el7 on Jun 2nd, but the
> problem didn't start until June 6th.
> 
> The tar command is stuck trying to write out the name of the directory to 
> amgtar:
> 
> root 11321 11315 33 Jun14 ?03:55:46 /usr/bin/tar --create
> --verbose --verbose --block-number --file - --directory /export/backup/rufous
> --no-check-device --listed-incremental
> /var/lib/amanda/gnutar-lists/saga_export_backup_rufous_0.new
> --ignore-failed-read --totals --exclude-from
> /var/log/amanda/amgtar._export_backup_rufous.20220614211711.exclude .
> root 11432 11427 42 Jun14 ?03:57:48 /usr/bin/tar --create
> --verbose --verbose --block-number --file - --directory /export/backup/rufous
> --no-check-device --listed-incremental
> /var/lib/amanda/gnutar-lists/saga_export_backup_rufous_0.new
> --ignore-failed-read --totals --exclude-from
> /var/log/amanda/amgtar._export_backup_rufous.20220614234714.exclude .
> 
> # strace -f -s 500 -p 11321
> strace: Process 11321 attached
> write(2, " ./export/home/orion-admin/ansible-boulder/.git/objects/db/", 59
> 
> # strace -f -s 500 -p 11432
> strace: Process 11432 attached
> write(2, " ./export/home/orion-admin/ansible-boulder/.git/objects/db/", 59
> 
> amgtar is stuck writing output to amandad:
> 
> # strace -s 200 -p 11427
> strace: Process 11427 attached
> write(4, "e_pylibssh.libs/\ndrwxr-x--- orion-admin/orion-admin 0
> 2022-05-23 17:31
> ./export/home/orion-admin/.local/lib/python3.8/site-packages/bcrypt/\ndrwxr-x---
> orion-admin/orion-admin 0 2022-05-23 17:3"..., 4096
> 
> 
> amgtar debug has:
> 
> Tue Jun 14 23:47:14.104627682 2022: pid 11427: ??-amgtar   : pid 11427 ruid 0
> euid 27120 ppid 11418 version 3.6.0: start at Tue Jun 14 23:47:14 2022
> Tue Jun 14 23:47:14.104731978 2022: pid 11427: ??-amgtar   : version 3.6.0
> Tue Jun 14 23:47:14.104957990 2022: pid 11427: ??-amgtar   : state_stream: 156
> Tue Jun 14 23:47:14.110961812 2022: pid 11427: ??-amgtar   : pid 11427 ruid 0
> euid 27120 ppid 11418 version 3.6.0: rename at Tue Jun 14 23:47:14 2022
> Tue Jun 14 23:47:14.110991922 2022: pid 11427: ??-amgtar   : GNUTAR-PATH 
> /bin/tar
> Tue Jun 14 23:47:14.111004028 2022: pid 11427: ??-amgtar   : GNUTAR-LISTDIR
> /var/lib/amanda/gnutar-lists
> Tue Jun 14 23:47:14.111013955 2022: pid 11427: ??-amgtar   : ONE-FILE-SYSTEM 
> no
> Tue Jun 14 23:47:14.111023583 2022: pid 11427: ??-amgtar   : SPARSE yes
> Tue Jun 14 23:47:14.111033039 2022: pid 11427: ??-amgtar   : NO-UNQUOTE no
> Tue Jun 14 23:47:14.111042417 2022: pid 11427: ??-amgtar   : ATIME-PRESERVE no
> Tue Jun 14 23:47:14.111051865 2022: pid 11427: ??-amgtar   : ACLS no
> Tue Jun 14 23:47:14.111061481 2022: pid 11427: ??-amgtar   : SELINUX no
> Tue Jun 14 23:47:14.21121 2022: pid 11427: ??-amgtar   : XATTRS no
> Tue Jun 14 23:47:14.30900 2022: pid 11427: ??-amgtar   : CHECK-DEVICE no
> Tue Jun 14 23:47:14.40474 2022: pid 11427: ??-amgtar   : SIZE ^ *Total
> bytes written: [0-9][0-9]*
> Tue Jun 14 23:47:14.50269 2022: pid 11427: ??-amgtar   : IGNORE :
> Directory is new$
> Tue Jun 14 23:47:14.59886 2022: pid 11427: ??-amgtar   : IGNORE :
> Directory has been renamed
> Tue Jun 14 23:47:14.69511 2022: pid 11427: ??-amgtar   : NORMAL ^could not
> open conf file
> Tue Jun 14 23:47:14.79212 2022: pid 11427: ??-amgtar   : NORMAL ^Elapsed 
> time:
> Tue Jun 14 23:47:14.111309287 2022: pid 11427: ??-amgtar   : NORMAL 
> ^Throughput
> Tue Jun 14 23:47:14.111321160 2022: pid 11427: ??-amgtar   : NORMAL :
> directory is on a different filesystem; not dumped
> Tue Jun 14 23:47:14.111331690 2022: pid 11427: ??-amgtar   : NORMAL : socket
> ignored$
> Tue Jun 14 23:47:14.111342121 2022: pid 11427: ??-amgtar   : NORMAL : File .*
> shrunk by [0-9][0-9]* bytes, padding with zeros
> Tue Jun 14 23:47:14.111397543 2022: pid 11427: ??-amgtar   : NORMAL : Cannot
> add file .*: No such file or directory$
> Tue Jun 14 23:47:14.111407572 2022: pid 11427: ??-amgtar   : NORMAL : Error
> exit delayed from previous errors
> Tue Jun 14 23:47:14.111417962 2022: pid 11427: ??-amgtar   : ERROR amgtar: 
> error
> Tue Jun 14 23:47:14.114473250 2022: pid 11427: ??-amgtar   : Spawning
> "/bin/tar /bin/tar -x --no-check-device -f -" in pipeline
> Tue Jun 14 23:47:14.124455374 2022: pid 11431: ??-amgtar   : Executing:
> /bin/tar '--version'
> Tue Jun 14 23:47:14.126144343 2022: pid 11427: ??-amgtar   : Spawning
> "/usr/bin/tar /usr/bin/tar --create --verbose --verbose --block-number --file
> - --directory /export/backup/rufous --no-check-device --listed-incremental
> /var/lib/amanda/gnutar-lists/saga_export_backup_rufous_0.new
> --ignore-failed-read --totals --exclude-from
> /var/log/amanda/amgtar._export_backup_rufous.20220614234714.exclude ." in 
> pipeline
> 
> amandad debug has: