Re: amrecover usage with chg-robot

2022-05-30 Thread Stefan G. Weichinger

Am 27.05.22 um 17:30 schrieb Nathan Stratton Treadway:


Ah, so euxvo_crypt is run by the amidxtaped process rather than by the
amrecover process itself.

What does strace show amrecover is doing during this period?

And "ps -ef" shows that the openssl process is still alive (i.e. not
defunct).  What does "strace" show on that process.  If you manually
kill it, does the change of processes up through amidxtaped unwind and
amrecover resume normal processing?



Took me a while to get to do the tests.

Here an strace of amrecover at the end of the first tape:


poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=11, 
events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 5, -1) 
= 2 ([{fd=4, revents=POLLIN}, {fd=5, revents=POLLIN}])

read(4, "\2\0\0\0\0\0\0\0", 16) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(5, "\0\0@\0\0\7\241\36", 8)= 8
read(5, 
"\204\34\313\360z\247!\324\306h\272\364\370_c\353\336$\262\224\231 
:\1\247\273\330\316\25\235c\2"..., 16384) = 16384
write(8, 
"\204\34\313\360z\247!\324\306h\272\364\370_c\353\336$\262\224\231 
:\1\247\273\330\316\25\235c\2"..., 16384) = 4096

write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11, 
events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 5, -1) 
= 1 ([{fd=4, revents=POLLIN}])

read(4, "\4\0\0\0\0\0\0\0", 16) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11, 
events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 5, -1) 
= 1 ([{fd=15, revents=POLLIN}])

write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(15, "\ngzip: stdin: decompression OK, "..., 2046) = 57
write(2, "\r", 1)   = 1
write(2, "/bin/gzip: \n", 12)   = 12
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=151148, 
si_uid=0, si_status=2, si_utime=21750, si_stime=1658} ---

write(2, "/bin/gzip: gzip: stdin: decompre"..., 67) = 67
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11, 
events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 5, -1) 
= 2 ([{fd=4, revents=POLLIN}, {fd=15, revents=POLLHUP}])

read(4, "\2\0\0\0\0\0\0\0", 16) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(15, "", 1989)  = 0
close(15)   = 0
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11, 
events=POLLIN}, {fd=13, events=POLLIN}], 4, -1) = 1 ([{fd=4, 
revents=POLLIN}])

read(4, "\3\0\0\0\0\0\0\0", 16) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11, 
events=POLLIN}, {fd=13, events=POLLIN}], 4, -1) = 2 ([{fd=11, 
revents=POLLHUP}, {fd=13, revents=POLLHUP}])

write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(11, "", 1830)  = 0
close(11)   = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=151150, 
si_uid=0, si_status=0, si_utime=351, si_stime=884} ---

write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(13, "", 2046)  = 0
close(13)   = 0
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}], 2, -1) = 1 
([{fd=4, revents=POLLIN}])




The PID 151150 was tar:

151130 pts/2Sl+0:02  094 119297 8976  0.0 amrecover abt 
-o auth=local -s localhost
 151133 pts/2S+ 0:00  029 19394  6552  0.0 
/usr/libexec/amanda/amandad -auth=local
 151134 pts/2S+ 0:00  036 20747  7864  0.0 
/usr/libexec/amanda/amindexd amandad local
 151135 pts/2Z+ 0:00  0 0 0 0  0.0 [amandad] 

 151138 pts/2R+ 0:00  029 19554  6504  0.0 
/usr/libexec/amanda/amandad -auth=local
 151139 pts/2Sl+0:01  0 3 233592 42820  0.2 
/usr/bin/perl /usr/libexec/amanda/amidxtaped amandad l
 151140 pts/2Z+ 0:00  0 0 0 0  0.0 [amandad] 

 151144 pts/4S+ 0:00  062  9685   940  0.0 tail -f 
amidxtaped.20220530124548.debug
 151146 pts/2S+ 0:00  0   812  5495  2984  0.0 /bin/bash 
/usr/sbin/exuvo_crypt -d

 151148 pts/2S+ 0:08  069  5382  1464  0.0 /bin/gzip -dc
 151150 pts/2S+ 0:00  0   385 10046  2572  0.0 tar 
--ignore-zeros --numeric-owner -xpGvf - ./etc
 151153 pts/2R+ 0:06  0   514  8301  4500  0.0 
/usr/bin/openssl enc -pbkdf2 -d -aes-256-ctr -salt -pas



-


strace openssl shows only:

# strace -p 151133
strace: Process 151133 attached
restart_syscall(<... resuming interrupted read ...>

Killing that logs in amidxtaped.xx.debug:


Mon May 30 12:55:26.730917575 2022: pid 151139: thd-0x563663659c00: 
amidxtaped: info

Re: amrecover usage with chg-robot

2022-05-27 Thread Nathan Stratton Treadway
On Fri, May 27, 2022 at 15:54:50 +0200, Stefan G. Weichinger wrote:
> 
> 
> forgot pstree:
> 
> -tmux: server-+-bash---amrecover-+-amandad-+-amandad
> 
>|  | `-amindexd
> 
>|  |-amandad-+-amandad
> 
>|  |
> `-amidxtaped-+-exuvo_crypt---openssl
> 
>|  | `-2*[{amidxtaped}]
> 
>|  |-gzip
> 
>|  |-tar
> 
>|  `-{amrecover}


Ah, so euxvo_crypt is run by the amidxtaped process rather than by the
amrecover process itself.

What does strace show amrecover is doing during this period?  

And "ps -ef" shows that the openssl process is still alive (i.e. not
defunct).  What does "strace" show on that process.  If you manually
kill it, does the change of processes up through amidxtaped unwind and
amrecover resume normal processing?

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: amrecover usage with chg-robot

2022-05-27 Thread Stefan G. Weichinger




forgot pstree:

-tmux: server-+-bash---amrecover-+-amandad-+-amandad

   |  | `-amindexd

   |  |-amandad-+-amandad

   |  | 
`-amidxtaped-+-exuvo_crypt---openssl


   |  | 
`-2*[{amidxtaped}]


   |  |-gzip

   |  |-tar

   |  `-{amrecover}


Re: amrecover usage with chg-robot

2022-05-27 Thread Stefan G. Weichinger

Am 27.05.22 um 14:28 schrieb Stefan G. Weichinger:


Am 27.05.22 um 14:05 schrieb Nathan Stratton Treadway:

On Fri, May 27, 2022 at 10:28:09 +0200, Stefan G. Weichinger wrote:

After that both tar and gzip.binary are shown as  in ps,
whatever that means.


Okay, that's a little progress in the investigation.

"" means that the process has exited, but the return code from
the process has not been read by the parent process yet.  So in this
case, whatever process spawned the tar and gzip subprocesses is not
"noticing" when the subprocesses finish... the question is why (and what
is it stuck doing instead of cleaning up)?

Are the "openssl enc" and/or encription-wrapper-script processes still
out there at this time (and what state are they in)?

You should be able to use pstree or "ps -ef" to determine which process
is the parent (PPID column) of the defunct subprocesses.


Yes, that parent process was still visible.

I will retry that later, currently I am updating stuff on that server etc


The parent is the amrecover process:


root 2154598 2096066 15 15:25 pts/200:01:26 amrecover abt -o 
auth=local -s localhost


amanda   2154601 2154598  0 15:25 pts/200:00:00 
/usr/libexec/amanda/amandad -auth=local


amanda   2154604 2154601  0 15:25 pts/200:00:00 
/usr/libexec/amanda/amindexd amandad local


amanda   2154605 2154601  0 15:25 pts/200:00:00 [amandad] 

amanda   2154632 2154598  6 15:25 pts/200:00:35 
/usr/libexec/amanda/amandad -auth=local


amanda   2154633 2154632  6 15:26 pts/200:00:32 /usr/bin/perl 
/usr/libexec/amanda/amidxtaped amandad local


amanda   2154634 2154632  0 15:26 pts/200:00:00 [amandad] 

amanda   2154701 2154633  0 15:26 pts/200:00:00 /bin/bash 
/usr/sbin/exuvo_crypt -d


root 2154703 2154598 49 15:26 pts/200:04:11 [gzip] 

root 2154705 2154598  2 15:26 pts/200:00:13 [tar] 

amanda   2154708 2154701 41 15:26 pts/200:03:31 /usr/bin/openssl enc 
-pbkdf2 -d -aes-256-ctr -salt -pass fd:3



This is while this is logged:

# tail -f amidxtaped.20220527152600.debug

Fri May 27 15:33:38.443013672 2022: pid 2154633: thd-0x556460b31400: 
amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.34/Amanda/Restore.pm:1719:info:490 
12472320 kb


Fri May 27 15:33:53.454973643 2022: pid 2154633: thd-0x556460b31400: 
amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.34/Amanda/Restore.pm:1719:info:490 
12472320 kb


Fri May 27 15:34:08.466959610 2022: pid 2154633: thd-0x556460b31400: 
amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.34/Amanda/Restore.pm:1719:info:490 
12472320 kb


PID 2154633 -> /usr/bin/perl /usr/libexec/amanda/amidxtaped amandad local


Re: amrecover usage with chg-robot

2022-05-27 Thread Stefan G. Weichinger



Am 27.05.22 um 14:05 schrieb Nathan Stratton Treadway:

On Fri, May 27, 2022 at 10:28:09 +0200, Stefan G. Weichinger wrote:

After that both tar and gzip.binary are shown as  in ps,
whatever that means.


Okay, that's a little progress in the investigation.

"" means that the process has exited, but the return code from
the process has not been read by the parent process yet.  So in this
case, whatever process spawned the tar and gzip subprocesses is not
"noticing" when the subprocesses finish... the question is why (and what
is it stuck doing instead of cleaning up)?

Are the "openssl enc" and/or encription-wrapper-script processes still
out there at this time (and what state are they in)?

You should be able to use pstree or "ps -ef" to determine which process
is the parent (PPID column) of the defunct subprocesses.


Yes, that parent process was still visible.

I will retry that later, currently I am updating stuff on that server etc

to add complexity: this is a rather old server with gentoo linux on it, 
a bit neglected because it should have been replaced long ago)


Re: amrecover usage with chg-robot

2022-05-27 Thread Nathan Stratton Treadway
On Fri, May 27, 2022 at 10:28:09 +0200, Stefan G. Weichinger wrote:
> After that both tar and gzip.binary are shown as  in ps,
> whatever that means.

Okay, that's a little progress in the investigation.

"" means that the process has exited, but the return code from
the process has not been read by the parent process yet.  So in this
case, whatever process spawned the tar and gzip subprocesses is not
"noticing" when the subprocesses finish... the question is why (and what
is it stuck doing instead of cleaning up)?

Are the "openssl enc" and/or encription-wrapper-script processes still
out there at this time (and what state are they in)?

You should be able to use pstree or "ps -ef" to determine which process
is the parent (PPID column) of the defunct subprocesses.

Nathan



Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: amrecover usage with chg-robot

2022-05-27 Thread Stefan G. Weichinger



Am 26.05.22 um 12:05 schrieb Stefan G. Weichinger:

Am 26.05.22 um 12:01 schrieb Stefan G. Weichinger:

Am 26.05.22 um 04:10 schrieb Exuvo:

I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing 
garbage ignored" before i used -q to zstd which suppresses warnings 
and interactivity.


Maybe I could edit to "gzip -d" somewhere in the code to also suppress 
this?


sorry, correction:

"gzip -d -q"


Trying this by:

* moved gzip to gzip.binary
* copied gunzip to gzip
* edited gzip to call "gzip.binary -q"

and rerunning my amrecover test with 2 tapes and device "robot"

First tape is extracted OK

That ugly "trailing garbage" message is gone now, as intended.

The strace on tar shows that it exits with exit code 0.

After that both tar and gzip.binary are shown as  in ps, 
whatever that means.


And amidxtaped.xx.debug shows the same

Fri May 27 10:26:40.042894830 2022: pid 737966: thd-0x563359eb2800: 
amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 
12472320 kb


as before. Maybe I have to wait ...


Re: amrecover usage with chg-robot

2022-05-26 Thread Stefan G. Weichinger

Am 26.05.22 um 12:01 schrieb Stefan G. Weichinger:

Am 26.05.22 um 04:10 schrieb Exuvo:

I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing 
garbage ignored" before i used -q to zstd which suppresses warnings 
and interactivity.


Maybe I could edit to "gzip -d" somewhere in the code to also suppress 
this?


sorry, correction:

"gzip -d -q"


Re: amrecover usage with chg-robot

2022-05-26 Thread Stefan G. Weichinger

Am 26.05.22 um 04:10 schrieb Exuvo:

I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing 
garbage ignored" before i used -q to zstd which suppresses warnings and 
interactivity.


Maybe I could edit to "gzip -d" somewhere in the code to also suppress this?




Re: amrecover usage with chg-robot

2022-05-26 Thread Stefan G. Weichinger

Am 26.05.22 um 04:10 schrieb Exuvo:

 From my notes when i was doing recovery testing:
  For some reason if i use encryption i get a single spurious error at 
the end of amcheckdump or amrecover if you recover the last file written:

   application stderr: /usr/bin/tar: Skipping to next header
   application stderr: /usr/bin/tar: Exiting with failure status due to 
previous errors
  It still recovers as it should so nothing to worry about and manual 
recovery from the same tape does not produce this warning but i have not 
yet figured out why it happens.


I did do a complete 8TB restore when i first set up my archive backups, 
which takes 7 tapes for me, changer worked, and that did give the above 
warning at the end but all files matched the source when i checked.


I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing 
garbage ignored" before i used -q to zstd which suppresses warnings and 
interactivity.


You could try a manual restore:
position at block 1 with mt fsb
mbuffer -i $TAPE -s 1024k -b 2048 -L | /etc/amanda/encrypt -d | 
/etc/amanda/zstd-compression -d | tar -xf -
or mbuffer -i $TAPE -s 1024k -b 2048 -L | openssl enc -pbkdf2 -d 
-aes-256-ctr -salt -pass fd:3 3< pwfile | zstd -dqcf | tar -xf -
That never gave the warnings as i get with amrecover but only feasible 
for full restores.
You can try zstd -dfo /dev/null to see if zstd gives any warnings while 
reading.


Thanks a lot for that detailed reply. Although in my case the 
compression is with gzip, not with zstd. So I am unsure if your 
suggestion applies.


A short search tells me that zstd is able to decompress gzip archives?

Nice, but the existing backups on tape tell amrecover to use "gzip -d" 
or so, right?


I see people setting up a symlink .. hmm, wondering.

In my "define changer" section i have device-property "READ_BUFFER_SIZE" 
"1024k", which i had to add to read older backups when i was testing 
different block sizes.

That settings means it supports up to that blocksize for restores.


I don't have that setting yet.

Set it to 2048k now to match

blocksize 2 mbytes

readblocksize 2 mbytes

in the used tapetype (maybe all 3 settings could go into either the 
tapetype OR the changer device-property lines).


Re: amrecover usage with chg-robot

2022-05-26 Thread Stefan G. Weichinger

Am 26.05.22 um 03:48 schrieb Exuvo:
To handle changing device names use ex 
/dev/tape/by-id/scsi-HUE12340E22-nst and /dev/tape/by-id/scsi-DEC1234567


Nice hack, thanks. Applied.



Re: amrecover usage with chg-robot

2022-05-25 Thread Exuvo

From my notes when i was doing recovery testing:
 For some reason if i use encryption i get a single spurious error at the end 
of amcheckdump or amrecover if you recover the last file written:
  application stderr: /usr/bin/tar: Skipping to next header
  application stderr: /usr/bin/tar: Exiting with failure status due to previous 
errors
 It still recovers as it should so nothing to worry about and manual recovery 
from the same tape does not produce this warning but i have not yet figured out 
why it happens.

I did do a complete 8TB restore when i first set up my archive backups, which 
takes 7 tapes for me, changer worked, and that did give the above warning at 
the end but all files matched the source when i checked.

I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing garbage 
ignored" before i used -q to zstd which suppresses warnings and interactivity.

You could try a manual restore:
position at block 1 with mt fsb
mbuffer -i $TAPE -s 1024k -b 2048 -L | /etc/amanda/encrypt -d | 
/etc/amanda/zstd-compression -d | tar -xf -
or mbuffer -i $TAPE -s 1024k -b 2048 -L | openssl enc -pbkdf2 -d -aes-256-ctr 
-salt -pass fd:3 3< pwfile | zstd -dqcf | tar -xf -
That never gave the warnings as i get with amrecover but only feasible for full 
restores.
You can try zstd -dfo /dev/null to see if zstd gives any warnings while reading.


In my "define changer" section i have device-property "READ_BUFFER_SIZE" 
"1024k", which i had to add to read older backups when i was testing different block sizes.
That settings means it supports up to that blocksize for restores.

Anton "exuvo" Olsson
   [email protected]

On 2022-05-25 17:09, Stefan G. Weichinger wrote:


Am 25.05.22 um 16:46 schrieb Stefan G. Weichinger:


The tested DLE is encrypted and compressed. Yes, redundant ... I can change 
that in the future.

The first part gets read perfectly, but it seems that the end of the tarball 
crashes things somehow.

I remember that error from mailing list threads .. I will research later when I 
am back at my office.


Testing with a DLE that is/was not encrypted. Plain tar, no compression.

amrecover with device "/dev/sg21" works great. I have to confirm the tape 
change after the first files have been extracted, then the changer loads the next tape 
and the next restore step happens. Great.

-

So to me it looks that my dumptype with both compression and encryption is the 
problem.

I use the script provided by Anton "exuvo" Olsson, he shared it in earlier 
threads here.

The current iteration on this server:

https://dpaste.org/2YrkJ

Maybe it hasn't yet been tested with amrecover from multiple tapes?

Or the combination with gzip is a problem.

I'll be happy to share any more debug logs or so.


Re: amrecover usage with chg-robot

2022-05-25 Thread Exuvo

To handle changing device names use ex /dev/tape/by-id/scsi-HUE12340E22-nst and 
/dev/tape/by-id/scsi-DEC1234567

Anton "exuvo" Olsson
   [email protected]

On 2022-05-25 14:50, Stefan G. Weichinger wrote:


Am 25.05.22 um 14:41 schrieb Nuno Dias:

My conf  (the relevant parts)

tpchanger "robot"

define changer robot {
 tpchanger "chg-robot:/dev/changer"
 changerfile "/etc/amanda//changer-state"
 property    "tape-device" "0=tape:/dev/nst0"
 property "eject-before-unload" "no"
 property "use-slots" "1-100"
 property "load-poll" "0s poll 3s until 120s"
}

amrecover_changer "/dev/changer"

and /dev/changer is a sym link to /dev/sg11

  So let me explain what is /dev/sg11 ... this is the device that
allows the manipulation and see the tapes and drives of the
robot, for example can be used with the command mtx

$ mtx -f /dev/sg11 status


Sure, thanks, I know what /dev/sg11 is ;-)  in my case /dev/sg21

(although that seems to change with reboots sometimes .. another topic)

The difference between your and my conf is only that symlink.

I directly have

tpchanger "chg-robot:/dev/sg21"

and

amrecover_changer "/dev/sg21"

-

Currently I have another amrecover running. It restored from tape1 .. and now I only see 
these lines in the current debug file "amidxtaped.20220525123652.debug":

Wed May 25 14:48:25.078884308 2022: pid 705002: thd-0x556f690aca00: amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 12472320 
kb
Wed May 25 14:48:40.090891256 2022: pid 705002: thd-0x556f690aca00: amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 12472320 
kb
Wed May 25 14:48:55.102880465 2022: pid 705002: thd-0x556f690aca00: amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 12472320 
kb

... for hours now.

Is that OK? Maybe tar still "scans" through that first tarball on tape ... ?


Re: amrecover usage with chg-robot

2022-05-25 Thread Nathan Stratton Treadway
On Wed, May 25, 2022 at 17:09:22 +0200, Stefan G. Weichinger wrote:
> 
> So to me it looks that my dumptype with both compression and
> encryption is the problem.
> 
> I use the script provided by Anton "exuvo" Olsson, he shared it in
> earlier threads here.
> 
> The current iteration on this server:
> 
> https://dpaste.org/2YrkJ
> 
> Maybe it hasn't yet been tested with amrecover from multiple tapes?
> 
> Or the combination with gzip is a problem.

I haven't used encryption with Amanda so I don't have anything
specific to suggest.

Off hand I don't see anything obviously incorrect with that script

(Well... in the encrypt-operation case it writes the contents of "$@" to
/tmp/encryptparams file but that file doesn't ever appear to be
referenced... but that parameters don't appear to be referenced in the
decrypt-operation case either, so I don't expect that aspect of the
script is related to your problem.)

My next step would be to investigate the status of the subprocesses
during the period where amrecover seems to be hung, i.e. using ps, lsof,
and strace.  

I'm guessing the "openssl enc -d" process doesn't exit for some reason;
can you identify what it's trying to do or waiting for?  If you see that
process out there but just sleeping (i.e. using no CPU, and strace shows
it's just stuck waiting in a "read" syscall or something)), what happens
if you manually kill the process (i.e. does amrecover the proceed to
its next step)?

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: amrecover usage with chg-robot

2022-05-25 Thread Stefan G. Weichinger



Am 25.05.22 um 16:46 schrieb Stefan G. Weichinger:

The tested DLE is encrypted and compressed. Yes, redundant ... I can 
change that in the future.


The first part gets read perfectly, but it seems that the end of the 
tarball crashes things somehow.


I remember that error from mailing list threads .. I will research later 
when I am back at my office.


Testing with a DLE that is/was not encrypted. Plain tar, no compression.

amrecover with device "/dev/sg21" works great. I have to confirm the 
tape change after the first files have been extracted, then the changer 
loads the next tape and the next restore step happens. Great.


-

So to me it looks that my dumptype with both compression and encryption 
is the problem.


I use the script provided by Anton "exuvo" Olsson, he shared it in 
earlier threads here.


The current iteration on this server:

https://dpaste.org/2YrkJ

Maybe it hasn't yet been tested with amrecover from multiple tapes?

Or the combination with gzip is a problem.

I'll be happy to share any more debug logs or so.


Re: amrecover usage with chg-robot

2022-05-25 Thread Stefan G. Weichinger

Am 25.05.22 um 16:12 schrieb Nathan Stratton Treadway:


When doing an extract tar does read on to the end of the tar file before
exiting, but "hours and hours" seems like a long time to wait for
that...

Is tar still running (e.g. what does "top" or "ps" show)?  If so, what
does strace on the tar process show?  Do any other amanda (sub)processes
exist on the system at this time?


I reran that now and have to add some more information:

That size value increases up to the mentioned 12472320 kb ... and until 
then tar is running.


Then the tar file seems to end, and amrecover says:

/bin/gzip: 
/bin/gzip: 
gzip: stdin: decompression OK, trailing garbage ignored


tar has stopped then,also no more gzip process visible.

The tested DLE is encrypted and compressed. Yes, redundant ... I can 
change that in the future.


The first part gets read perfectly, but it seems that the end of the 
tarball crashes things somehow.


I remember that error from mailing list threads .. I will research later 
when I am back at my office.


Re: amrecover usage with chg-robot

2022-05-25 Thread Nathan Stratton Treadway
On Wed, May 25, 2022 at 14:50:00 +0200, Stefan G. Weichinger wrote:
> Currently I have another amrecover running. It restored from tape1
> .. and now I only see these lines in the current debug file
> "amidxtaped.20220525123652.debug":
> 
> Wed May 25 14:48:25.078884308 2022: pid 705002: thd-0x556f690aca00:
> amidxtaped:
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490
> 12472320 kb
> Wed May 25 14:48:40.090891256 2022: pid 705002: thd-0x556f690aca00:
> amidxtaped:
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490
> 12472320 kb
> Wed May 25 14:48:55.102880465 2022: pid 705002: thd-0x556f690aca00:
> amidxtaped:
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490
> 12472320 kb
> 
> ... for hours now.

> Is that OK? Maybe tar still "scans" through that first tarball on tape ... ?

When doing an extract tar does read on to the end of the tar file before
exiting, but "hours and hours" seems like a long time to wait for
that...

Is tar still running (e.g. what does "top" or "ps" show)?  If so, what
does strace on the tar process show?  Do any other amanda (sub)processes
exist on the system at this time?


Nathan



Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: amrecover usage with chg-robot

2022-05-25 Thread Stefan G. Weichinger

Am 25.05.22 um 15:35 schrieb Nuno Dias:

  That messages seems ok, but usually the "kb" will increase with
the time, the same "kb" everytime seems a little strange,
unfortunately I can't help more than this! I think you need
someone that understand more of amanda.


Never mind, thanks for providing your config etc

I also expect that value to increase ... maybe someone comes up with 
some more details on all this.


For now I stop my amrecover-test-run. It should have succeeded hours ago.


Re: amrecover usage with chg-robot

2022-05-25 Thread Nuno Dias
 That messages seems ok, but usually the "kb" will increase with
the time, the same "kb" everytime seems a little strange,
unfortunately I can't help more than this! I think you need
someone that understand more of amanda.

cheers,
Nuno 

On Wed, 2022-05-25 at 14:50 +0200, Stefan G. Weichinger wrote:
> 
> Am 25.05.22 um 14:41 schrieb Nuno Dias:
> > My conf  (the relevant parts)
> > 
> > tpchanger "robot"
> > 
> > define changer robot {
> >  tpchanger "chg-robot:/dev/changer"
> >  changerfile "/etc/amanda//changer-state"
> >  property    "tape-device" "0=tape:/dev/nst0"
> >  property "eject-before-unload" "no"
> >  property "use-slots" "1-100"
> >  property "load-poll" "0s poll 3s until 120s"
> > }
> > 
> > amrecover_changer "/dev/changer"
> > 
> > and /dev/changer is a sym link to /dev/sg11
> > 
> >   So let me explain what is /dev/sg11 ... this is the device
> > that
> > allows the manipulation and see the tapes and drives of the
> > robot, for example can be used with the command mtx
> > 
> > $ mtx -f /dev/sg11 status
> 
> Sure, thanks, I know what /dev/sg11 is ;-)  in my case
> /dev/sg21
> 
> (although that seems to change with reboots sometimes ..
> another topic)
> 
> The difference between your and my conf is only that symlink.
> 
> I directly have
> 
> tpchanger "chg-robot:/dev/sg21"
> 
> and
> 
> amrecover_changer "/dev/sg21"
> 
> -
> 
> Currently I have another amrecover running. It restored from
> tape1 .. 
> and now I only see these lines in the current debug file 
> "amidxtaped.20220525123652.debug":
> 
> Wed May 25 14:48:25.078884308 2022: pid 705002: thd-
> 0x556f690aca00: 
> amidxtaped: 
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:4
> 90 
> 12472320 kb
> Wed May 25 14:48:40.090891256 2022: pid 705002: thd-
> 0x556f690aca00: 
> amidxtaped: 
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:4
> 90 
> 12472320 kb
> Wed May 25 14:48:55.102880465 2022: pid 705002: thd-
> 0x556f690aca00: 
> amidxtaped: 
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:4
> 90 
> 12472320 kb
> 
> ... for hours now.
> 
> Is that OK? Maybe tar still "scans" through that first tarball
> on tape ... ?
> 

-- 
Nuno Dias 
LIP



Re: amrecover usage with chg-robot

2022-05-25 Thread Stefan G. Weichinger



Am 25.05.22 um 14:41 schrieb Nuno Dias:

My conf  (the relevant parts)

tpchanger "robot"

define changer robot {
 tpchanger "chg-robot:/dev/changer"
 changerfile "/etc/amanda//changer-state"
 property"tape-device" "0=tape:/dev/nst0"
 property "eject-before-unload" "no"
 property "use-slots" "1-100"
 property "load-poll" "0s poll 3s until 120s"
}

amrecover_changer "/dev/changer"

and /dev/changer is a sym link to /dev/sg11

  So let me explain what is /dev/sg11 ... this is the device that
allows the manipulation and see the tapes and drives of the
robot, for example can be used with the command mtx

$ mtx -f /dev/sg11 status


Sure, thanks, I know what /dev/sg11 is ;-)  in my case /dev/sg21

(although that seems to change with reboots sometimes .. another topic)

The difference between your and my conf is only that symlink.

I directly have

tpchanger "chg-robot:/dev/sg21"

and

amrecover_changer "/dev/sg21"

-

Currently I have another amrecover running. It restored from tape1 .. 
and now I only see these lines in the current debug file 
"amidxtaped.20220525123652.debug":


Wed May 25 14:48:25.078884308 2022: pid 705002: thd-0x556f690aca00: 
amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 
12472320 kb
Wed May 25 14:48:40.090891256 2022: pid 705002: thd-0x556f690aca00: 
amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 
12472320 kb
Wed May 25 14:48:55.102880465 2022: pid 705002: thd-0x556f690aca00: 
amidxtaped: 
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 
12472320 kb


... for hours now.

Is that OK? Maybe tar still "scans" through that first tarball on tape ... ?


Re: amrecover usage with chg-robot

2022-05-25 Thread Nuno Dias
My conf  (the relevant parts) 

tpchanger "robot"

define changer robot {
tpchanger "chg-robot:/dev/changer" 
changerfile "/etc/amanda//changer-state"
property"tape-device" "0=tape:/dev/nst0" 
property "eject-before-unload" "no"
property "use-slots" "1-100"
property "load-poll" "0s poll 3s until 120s"
}

amrecover_changer "/dev/changer"

and /dev/changer is a sym link to /dev/sg11 

 So let me explain what is /dev/sg11 ... this is the device that
allows the manipulation and see the tapes and drives of the
robot, for example can be used with the command mtx

$ mtx -f /dev/sg11 status

Cheers,
Nuno

On Wed, 2022-05-25 at 12:35 +0200, Stefan G. Weichinger wrote:
> Am 25.05.22 um 12:17 schrieb Nuno Dias:
> > 
> > I have "tapedev" in /etc/amanda/amanda-client.conf  but is
> > commented,  maybe was used a long time ago, but right now is
> > not
> > necessary with my configuration.
> > 
> >   And yes I do not choose any device or changer when I
> > execute
> > amrecover.
> 
> OK, great.
> 
> My amrecover still wants to use /dev/nst0, even after setting 
> "amrecover_changer"
> 
> I can edit it, no problem, but I try to configure it as good as
> possible 
> ... and write a decent howto.
> 
> Setting "robot" as device in amrecover doesn't work.
> 
> What do you have in "tpchanger" ?
> 
> pls share the relevant blocks
> 
> 
> 
> 
> 
> 
> 
> 

-- 
Nuno Dias 
LIP



Re: amrecover usage with chg-robot

2022-05-25 Thread Stefan G. Weichinger

Am 25.05.22 um 12:17 schrieb Nuno Dias:


I have "tapedev" in /etc/amanda/amanda-client.conf  but is
commented,  maybe was used a long time ago, but right now is not
necessary with my configuration.

  And yes I do not choose any device or changer when I execute
amrecover.


OK, great.

My amrecover still wants to use /dev/nst0, even after setting 
"amrecover_changer"


I can edit it, no problem, but I try to configure it as good as possible 
... and write a decent howto.


Setting "robot" as device in amrecover doesn't work.

What do you have in "tpchanger" ?

pls share the relevant blocks










Re: amrecover usage with chg-robot

2022-05-25 Thread Nuno Dias


I have "tapedev" in /etc/amanda/amanda-client.conf  but is
commented,  maybe was used a long time ago, but right now is not
necessary with my configuration.

 And yes I do not choose any device or changer when I execute
amrecover.

Cheers,
NUno

On Wed, 2022-05-25 at 11:58 +0200, Stefan G. Weichinger wrote:
> Am 25.05.22 um 11:50 schrieb Nuno Dias:
> >   Hi,
> > 
> >   I have a similar configuration and works as expected.
> >   The only thing different is
> > 
> > amrecover_changer "/dev/changer"
> > 
> > where /dev/changer is a sym link to /dev/sg11 (The Robot)
> > 
> >   When I recover something I do not define anything and the
> > changer and tapes devices are obtain in the CONF of
> > amanda.conf
> > 
> > I only execute this as root and the recover works
> >   
> > $ amrecover CONF -s amanda -t amanda
> 
> Thanks, Nuno.
> 
> Do you have a "tapedev" in /etc/amanda/amanda-client.conf ?
> 
> That one also determines how amrecover behaves.
> 
> So you don't choose any device within amrecover, right?
> 

-- 
Nuno Dias 
LIP



Re: amrecover usage with chg-robot

2022-05-25 Thread Nuno Dias
 Hi,

 I have a similar configuration and works as expected. 
 The only thing different is 

amrecover_changer "/dev/changer"

where /dev/changer is a sym link to /dev/sg11 (The Robot)

 When I recover something I do not define anything and the
changer and tapes devices are obtain in the CONF of amanda.conf

I only execute this as root and the recover works 
 
$ amrecover CONF -s amanda -t amanda 

Cheers,
Nuno

On Wed, 2022-05-25 at 09:07 +0200, Stefan G. Weichinger wrote:
> 
> 
> At a customer I decided to do some recovery testing.
> 
> We use the chg-robot changer there, with one tape drive.
> 
> See below:
> 
> * tapetype
> * changer
> * taperscan
> * storage
> * amrecover_changer
> 
> (I could also show the whole config, but don't want to flood
> the list ...)
> 
> Amdumps work fine.
> Recovery ... I have questions:
> 
> I decided to restore a DLE with a lev1 on one tape and the lev0
> on 
> another tape, to test the whole handling of 2 or more tapes.
> 
> That lead to amrecover reading the lev1 tarball from tape 1,
> and then 
> hours of waiting ... it never got to the point of requesting
> tape 2 or 
> changing it on its own.
> 
> I read the manpages etc, and I am unsure if to use "robot" (=
> changer 
> name) as tape device in amrecover, or "tape:/dev/nst0"
> directly.
> 
> tested both, no difference
> 
> Maybe the tape drive doesn't return some of those EOM/EOT
> messages?
> 
> It's a HP Ultrium 4-SCSI, Rev: B12H ... as far as I remember
> that worked 
> before.
> 
> Thanks for any hints here.
> 
> (ah, for reference: Amanda-3.5.1 on Debian 11.3 right now)
> 
> ---
> 
> here the configs:
> 
> 
> define tapetype LTO-4 { 
>  │
>  comment "Created by amtapetype; compression disabled; 
> 2017-10-31 sgw"    │
>  length 698510208 kbytes 
>   │
>  filemark 0 kbytes 
>   │
>  speed 36696 kps 
>   │
>  blocksize 2 mbytes 
>   │
>  readblocksize 2 mbytes 
>   │
> } 
>  │
>  
>   │
> define changer robot { 
>  │
>  tpchanger "chg-robot:/dev/sg21" # lsscsi -g 
>   │
>  property "tape-device" "0=tape:/dev/nst0" 
>   │
>  property "eject-before-unload" "no" 
>   │
>  property "use-slots" "1-24" 
>   │
>  changerfile "/etc/amanda/abt/chg-robot-dev-sg21" 
>    │
> } 
>  │
>  
>   │
> define taperscan lexi { 
>  │
>  comment "none" 
>   │
>  plugin "lexical" 
>   │
> } 
>  │
>  
>   │
> define storage abt { 
>  │
>  tapepool "abt" 
>   │
>  tapetype "LTO-4" 
>   │
>  tpchanger "robot" 
>   │
>  runtapes 1 
>   │
>  taperscan "lexi" 
>   │
>  
>   │
>  flush-threshold-dumped  300 
>   │
>  flush-threshold-scheduled 300 
>   │
>  taperflush 300 
>   │
>  autoflush  yes 
>   │
>  
>   │
>  #labelstr "^ABT-[0-9][0-9]*$" 
>   │
>  autolabel "ABT-%b" 
>   │
> } 
>  │
>  
>   │
> reserve 30 
>  │
>  
>   │
> amrecover_changer "robot"
> 

-- 
Nuno Dias 
LIP



amrecover usage with chg-robot

2022-05-25 Thread Stefan G. Weichinger




At a customer I decided to do some recovery testing.

We use the chg-robot changer there, with one tape drive.

See below:

* tapetype
* changer
* taperscan
* storage
* amrecover_changer

(I could also show the whole config, but don't want to flood the list ...)

Amdumps work fine.
Recovery ... I have questions:

I decided to restore a DLE with a lev1 on one tape and the lev0 on 
another tape, to test the whole handling of 2 or more tapes.


That lead to amrecover reading the lev1 tarball from tape 1, and then 
hours of waiting ... it never got to the point of requesting tape 2 or 
changing it on its own.


I read the manpages etc, and I am unsure if to use "robot" (= changer 
name) as tape device in amrecover, or "tape:/dev/nst0" directly.


tested both, no difference

Maybe the tape drive doesn't return some of those EOM/EOT messages?

It's a HP Ultrium 4-SCSI, Rev: B12H ... as far as I remember that worked 
before.


Thanks for any hints here.

(ah, for reference: Amanda-3.5.1 on Debian 11.3 right now)

---

here the configs:


define tapetype LTO-4 { 
│
comment "Created by amtapetype; compression disabled; 
2017-10-31 sgw"│
length 698510208 kbytes 
 │
filemark 0 kbytes 
 │
speed 36696 kps 
 │
blocksize 2 mbytes 
 │
readblocksize 2 mbytes 
 │
} 
│


 │
define changer robot { 
│
tpchanger "chg-robot:/dev/sg21" # lsscsi -g 
 │
property "tape-device" "0=tape:/dev/nst0" 
 │
property "eject-before-unload" "no" 
 │
property "use-slots" "1-24" 
 │
changerfile "/etc/amanda/abt/chg-robot-dev-sg21" 
  │
} 
│


 │
define taperscan lexi { 
│
comment "none" 
 │
plugin "lexical" 
 │
} 
│


 │
define storage abt { 
│
tapepool "abt" 
 │
tapetype "LTO-4" 
 │
tpchanger "robot" 
 │
runtapes 1 
 │
taperscan "lexi" 
 │


 │
flush-threshold-dumped  300 
 │
flush-threshold-scheduled 300 
 │
taperflush 300 
 │
autoflush  yes 
 │


 │
#labelstr "^ABT-[0-9][0-9]*$" 
 │
autolabel "ABT-%b" 
 │
} 
│


 │
reserve 30 
│


 │
amrecover_changer "robot"