Re: amrecover usage with chg-robot
Am 27.05.22 um 17:30 schrieb Nathan Stratton Treadway:
Ah, so euxvo_crypt is run by the amidxtaped process rather than by the
amrecover process itself.
What does strace show amrecover is doing during this period?
And "ps -ef" shows that the openssl process is still alive (i.e. not
defunct). What does "strace" show on that process. If you manually
kill it, does the change of processes up through amidxtaped unwind and
amrecover resume normal processing?
Took me a while to get to do the tests.
Here an strace of amrecover at the end of the first tape:
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=11,
events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 5, -1)
= 2 ([{fd=4, revents=POLLIN}, {fd=5, revents=POLLIN}])
read(4, "\2\0\0\0\0\0\0\0", 16) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(5, "\0\0@\0\0\7\241\36", 8)= 8
read(5,
"\204\34\313\360z\247!\324\306h\272\364\370_c\353\336$\262\224\231
:\1\247\273\330\316\25\235c\2"..., 16384) = 16384
write(8,
"\204\34\313\360z\247!\324\306h\272\364\370_c\353\336$\262\224\231
:\1\247\273\330\316\25\235c\2"..., 16384) = 4096
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11,
events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 5, -1)
= 1 ([{fd=4, revents=POLLIN}])
read(4, "\4\0\0\0\0\0\0\0", 16) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11,
events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 5, -1)
= 1 ([{fd=15, revents=POLLIN}])
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(15, "\ngzip: stdin: decompression OK, "..., 2046) = 57
write(2, "\r", 1) = 1
write(2, "/bin/gzip: \n", 12) = 12
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=151148,
si_uid=0, si_status=2, si_utime=21750, si_stime=1658} ---
write(2, "/bin/gzip: gzip: stdin: decompre"..., 67) = 67
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11,
events=POLLIN}, {fd=13, events=POLLIN}, {fd=15, events=POLLIN}], 5, -1)
= 2 ([{fd=4, revents=POLLIN}, {fd=15, revents=POLLHUP}])
read(4, "\2\0\0\0\0\0\0\0", 16) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(15, "", 1989) = 0
close(15) = 0
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11,
events=POLLIN}, {fd=13, events=POLLIN}], 4, -1) = 1 ([{fd=4,
revents=POLLIN}])
read(4, "\3\0\0\0\0\0\0\0", 16) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}, {fd=11,
events=POLLIN}, {fd=13, events=POLLIN}], 4, -1) = 2 ([{fd=11,
revents=POLLHUP}, {fd=13, revents=POLLHUP}])
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(11, "", 1830) = 0
close(11) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=151150,
si_uid=0, si_status=0, si_utime=351, si_stime=884} ---
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
read(13, "", 2046) = 0
close(13) = 0
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
write(4, "\1\0\0\0\0\0\0\0", 8) = 8
poll([{fd=4, events=POLLIN}, {fd=8, events=POLLOUT}], 2, -1) = 1
([{fd=4, revents=POLLIN}])
The PID 151150 was tar:
151130 pts/2Sl+0:02 094 119297 8976 0.0 amrecover abt
-o auth=local -s localhost
151133 pts/2S+ 0:00 029 19394 6552 0.0
/usr/libexec/amanda/amandad -auth=local
151134 pts/2S+ 0:00 036 20747 7864 0.0
/usr/libexec/amanda/amindexd amandad local
151135 pts/2Z+ 0:00 0 0 0 0 0.0 [amandad]
151138 pts/2R+ 0:00 029 19554 6504 0.0
/usr/libexec/amanda/amandad -auth=local
151139 pts/2Sl+0:01 0 3 233592 42820 0.2
/usr/bin/perl /usr/libexec/amanda/amidxtaped amandad l
151140 pts/2Z+ 0:00 0 0 0 0 0.0 [amandad]
151144 pts/4S+ 0:00 062 9685 940 0.0 tail -f
amidxtaped.20220530124548.debug
151146 pts/2S+ 0:00 0 812 5495 2984 0.0 /bin/bash
/usr/sbin/exuvo_crypt -d
151148 pts/2S+ 0:08 069 5382 1464 0.0 /bin/gzip -dc
151150 pts/2S+ 0:00 0 385 10046 2572 0.0 tar
--ignore-zeros --numeric-owner -xpGvf - ./etc
151153 pts/2R+ 0:06 0 514 8301 4500 0.0
/usr/bin/openssl enc -pbkdf2 -d -aes-256-ctr -salt -pas
-
strace openssl shows only:
# strace -p 151133
strace: Process 151133 attached
restart_syscall(<... resuming interrupted read ...>
Killing that logs in amidxtaped.xx.debug:
Mon May 30 12:55:26.730917575 2022: pid 151139: thd-0x563663659c00:
amidxtaped: info
Re: amrecover usage with chg-robot
On Fri, May 27, 2022 at 15:54:50 +0200, Stefan G. Weichinger wrote:
>
>
> forgot pstree:
>
> -tmux: server-+-bash---amrecover-+-amandad-+-amandad
>
>| | `-amindexd
>
>| |-amandad-+-amandad
>
>| |
> `-amidxtaped-+-exuvo_crypt---openssl
>
>| | `-2*[{amidxtaped}]
>
>| |-gzip
>
>| |-tar
>
>| `-{amrecover}
Ah, so euxvo_crypt is run by the amidxtaped process rather than by the
amrecover process itself.
What does strace show amrecover is doing during this period?
And "ps -ef" shows that the openssl process is still alive (i.e. not
defunct). What does "strace" show on that process. If you manually
kill it, does the change of processes up through amidxtaped unwind and
amrecover resume normal processing?
Nathan
Nathan Stratton Treadway - [email protected] - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amrecover usage with chg-robot
forgot pstree:
-tmux: server-+-bash---amrecover-+-amandad-+-amandad
| | `-amindexd
| |-amandad-+-amandad
| |
`-amidxtaped-+-exuvo_crypt---openssl
| |
`-2*[{amidxtaped}]
| |-gzip
| |-tar
| `-{amrecover}
Re: amrecover usage with chg-robot
Am 27.05.22 um 14:28 schrieb Stefan G. Weichinger: Am 27.05.22 um 14:05 schrieb Nathan Stratton Treadway: On Fri, May 27, 2022 at 10:28:09 +0200, Stefan G. Weichinger wrote: After that both tar and gzip.binary are shown as in ps, whatever that means. Okay, that's a little progress in the investigation. "" means that the process has exited, but the return code from the process has not been read by the parent process yet. So in this case, whatever process spawned the tar and gzip subprocesses is not "noticing" when the subprocesses finish... the question is why (and what is it stuck doing instead of cleaning up)? Are the "openssl enc" and/or encription-wrapper-script processes still out there at this time (and what state are they in)? You should be able to use pstree or "ps -ef" to determine which process is the parent (PPID column) of the defunct subprocesses. Yes, that parent process was still visible. I will retry that later, currently I am updating stuff on that server etc The parent is the amrecover process: root 2154598 2096066 15 15:25 pts/200:01:26 amrecover abt -o auth=local -s localhost amanda 2154601 2154598 0 15:25 pts/200:00:00 /usr/libexec/amanda/amandad -auth=local amanda 2154604 2154601 0 15:25 pts/200:00:00 /usr/libexec/amanda/amindexd amandad local amanda 2154605 2154601 0 15:25 pts/200:00:00 [amandad] amanda 2154632 2154598 6 15:25 pts/200:00:35 /usr/libexec/amanda/amandad -auth=local amanda 2154633 2154632 6 15:26 pts/200:00:32 /usr/bin/perl /usr/libexec/amanda/amidxtaped amandad local amanda 2154634 2154632 0 15:26 pts/200:00:00 [amandad] amanda 2154701 2154633 0 15:26 pts/200:00:00 /bin/bash /usr/sbin/exuvo_crypt -d root 2154703 2154598 49 15:26 pts/200:04:11 [gzip] root 2154705 2154598 2 15:26 pts/200:00:13 [tar] amanda 2154708 2154701 41 15:26 pts/200:03:31 /usr/bin/openssl enc -pbkdf2 -d -aes-256-ctr -salt -pass fd:3 This is while this is logged: # tail -f amidxtaped.20220527152600.debug Fri May 27 15:33:38.443013672 2022: pid 2154633: thd-0x556460b31400: amidxtaped: /usr/lib64/perl5/vendor_perl/5.34/Amanda/Restore.pm:1719:info:490 12472320 kb Fri May 27 15:33:53.454973643 2022: pid 2154633: thd-0x556460b31400: amidxtaped: /usr/lib64/perl5/vendor_perl/5.34/Amanda/Restore.pm:1719:info:490 12472320 kb Fri May 27 15:34:08.466959610 2022: pid 2154633: thd-0x556460b31400: amidxtaped: /usr/lib64/perl5/vendor_perl/5.34/Amanda/Restore.pm:1719:info:490 12472320 kb PID 2154633 -> /usr/bin/perl /usr/libexec/amanda/amidxtaped amandad local
Re: amrecover usage with chg-robot
Am 27.05.22 um 14:05 schrieb Nathan Stratton Treadway: On Fri, May 27, 2022 at 10:28:09 +0200, Stefan G. Weichinger wrote: After that both tar and gzip.binary are shown as in ps, whatever that means. Okay, that's a little progress in the investigation. "" means that the process has exited, but the return code from the process has not been read by the parent process yet. So in this case, whatever process spawned the tar and gzip subprocesses is not "noticing" when the subprocesses finish... the question is why (and what is it stuck doing instead of cleaning up)? Are the "openssl enc" and/or encription-wrapper-script processes still out there at this time (and what state are they in)? You should be able to use pstree or "ps -ef" to determine which process is the parent (PPID column) of the defunct subprocesses. Yes, that parent process was still visible. I will retry that later, currently I am updating stuff on that server etc to add complexity: this is a rather old server with gentoo linux on it, a bit neglected because it should have been replaced long ago)
Re: amrecover usage with chg-robot
On Fri, May 27, 2022 at 10:28:09 +0200, Stefan G. Weichinger wrote: > After that both tar and gzip.binary are shown as in ps, > whatever that means. Okay, that's a little progress in the investigation. "" means that the process has exited, but the return code from the process has not been read by the parent process yet. So in this case, whatever process spawned the tar and gzip subprocesses is not "noticing" when the subprocesses finish... the question is why (and what is it stuck doing instead of cleaning up)? Are the "openssl enc" and/or encription-wrapper-script processes still out there at this time (and what state are they in)? You should be able to use pstree or "ps -ef" to determine which process is the parent (PPID column) of the defunct subprocesses. Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amrecover usage with chg-robot
Am 26.05.22 um 12:05 schrieb Stefan G. Weichinger: Am 26.05.22 um 12:01 schrieb Stefan G. Weichinger: Am 26.05.22 um 04:10 schrieb Exuvo: I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing garbage ignored" before i used -q to zstd which suppresses warnings and interactivity. Maybe I could edit to "gzip -d" somewhere in the code to also suppress this? sorry, correction: "gzip -d -q" Trying this by: * moved gzip to gzip.binary * copied gunzip to gzip * edited gzip to call "gzip.binary -q" and rerunning my amrecover test with 2 tapes and device "robot" First tape is extracted OK That ugly "trailing garbage" message is gone now, as intended. The strace on tar shows that it exits with exit code 0. After that both tar and gzip.binary are shown as in ps, whatever that means. And amidxtaped.xx.debug shows the same Fri May 27 10:26:40.042894830 2022: pid 737966: thd-0x563359eb2800: amidxtaped: /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 12472320 kb as before. Maybe I have to wait ...
Re: amrecover usage with chg-robot
Am 26.05.22 um 12:01 schrieb Stefan G. Weichinger: Am 26.05.22 um 04:10 schrieb Exuvo: I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing garbage ignored" before i used -q to zstd which suppresses warnings and interactivity. Maybe I could edit to "gzip -d" somewhere in the code to also suppress this? sorry, correction: "gzip -d -q"
Re: amrecover usage with chg-robot
Am 26.05.22 um 04:10 schrieb Exuvo: I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing garbage ignored" before i used -q to zstd which suppresses warnings and interactivity. Maybe I could edit to "gzip -d" somewhere in the code to also suppress this?
Re: amrecover usage with chg-robot
Am 26.05.22 um 04:10 schrieb Exuvo: From my notes when i was doing recovery testing: For some reason if i use encryption i get a single spurious error at the end of amcheckdump or amrecover if you recover the last file written: application stderr: /usr/bin/tar: Skipping to next header application stderr: /usr/bin/tar: Exiting with failure status due to previous errors It still recovers as it should so nothing to worry about and manual recovery from the same tape does not produce this warning but i have not yet figured out why it happens. I did do a complete 8TB restore when i first set up my archive backups, which takes 7 tapes for me, changer worked, and that did give the above warning at the end but all files matched the source when i checked. I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing garbage ignored" before i used -q to zstd which suppresses warnings and interactivity. You could try a manual restore: position at block 1 with mt fsb mbuffer -i $TAPE -s 1024k -b 2048 -L | /etc/amanda/encrypt -d | /etc/amanda/zstd-compression -d | tar -xf - or mbuffer -i $TAPE -s 1024k -b 2048 -L | openssl enc -pbkdf2 -d -aes-256-ctr -salt -pass fd:3 3< pwfile | zstd -dqcf | tar -xf - That never gave the warnings as i get with amrecover but only feasible for full restores. You can try zstd -dfo /dev/null to see if zstd gives any warnings while reading. Thanks a lot for that detailed reply. Although in my case the compression is with gzip, not with zstd. So I am unsure if your suggestion applies. A short search tells me that zstd is able to decompress gzip archives? Nice, but the existing backups on tape tell amrecover to use "gzip -d" or so, right? I see people setting up a symlink .. hmm, wondering. In my "define changer" section i have device-property "READ_BUFFER_SIZE" "1024k", which i had to add to read older backups when i was testing different block sizes. That settings means it supports up to that blocksize for restores. I don't have that setting yet. Set it to 2048k now to match blocksize 2 mbytes readblocksize 2 mbytes in the used tapetype (maybe all 3 settings could go into either the tapetype OR the changer device-property lines).
Re: amrecover usage with chg-robot
Am 26.05.22 um 03:48 schrieb Exuvo: To handle changing device names use ex /dev/tape/by-id/scsi-HUE12340E22-nst and /dev/tape/by-id/scsi-DEC1234567 Nice hack, thanks. Applied.
Re: amrecover usage with chg-robot
From my notes when i was doing recovery testing: For some reason if i use encryption i get a single spurious error at the end of amcheckdump or amrecover if you recover the last file written: application stderr: /usr/bin/tar: Skipping to next header application stderr: /usr/bin/tar: Exiting with failure status due to previous errors It still recovers as it should so nothing to worry about and manual recovery from the same tape does not produce this warning but i have not yet figured out why it happens. I did do a complete 8TB restore when i first set up my archive backups, which takes 7 tapes for me, changer worked, and that did give the above warning at the end but all files matched the source when i checked. I think i had "/bin/gzip: gzip: stdin: decompression OK, trailing garbage ignored" before i used -q to zstd which suppresses warnings and interactivity. You could try a manual restore: position at block 1 with mt fsb mbuffer -i $TAPE -s 1024k -b 2048 -L | /etc/amanda/encrypt -d | /etc/amanda/zstd-compression -d | tar -xf - or mbuffer -i $TAPE -s 1024k -b 2048 -L | openssl enc -pbkdf2 -d -aes-256-ctr -salt -pass fd:3 3< pwfile | zstd -dqcf | tar -xf - That never gave the warnings as i get with amrecover but only feasible for full restores. You can try zstd -dfo /dev/null to see if zstd gives any warnings while reading. In my "define changer" section i have device-property "READ_BUFFER_SIZE" "1024k", which i had to add to read older backups when i was testing different block sizes. That settings means it supports up to that blocksize for restores. Anton "exuvo" Olsson [email protected] On 2022-05-25 17:09, Stefan G. Weichinger wrote: Am 25.05.22 um 16:46 schrieb Stefan G. Weichinger: The tested DLE is encrypted and compressed. Yes, redundant ... I can change that in the future. The first part gets read perfectly, but it seems that the end of the tarball crashes things somehow. I remember that error from mailing list threads .. I will research later when I am back at my office. Testing with a DLE that is/was not encrypted. Plain tar, no compression. amrecover with device "/dev/sg21" works great. I have to confirm the tape change after the first files have been extracted, then the changer loads the next tape and the next restore step happens. Great. - So to me it looks that my dumptype with both compression and encryption is the problem. I use the script provided by Anton "exuvo" Olsson, he shared it in earlier threads here. The current iteration on this server: https://dpaste.org/2YrkJ Maybe it hasn't yet been tested with amrecover from multiple tapes? Or the combination with gzip is a problem. I'll be happy to share any more debug logs or so.
Re: amrecover usage with chg-robot
To handle changing device names use ex /dev/tape/by-id/scsi-HUE12340E22-nst and /dev/tape/by-id/scsi-DEC1234567 Anton "exuvo" Olsson [email protected] On 2022-05-25 14:50, Stefan G. Weichinger wrote: Am 25.05.22 um 14:41 schrieb Nuno Dias: My conf (the relevant parts) tpchanger "robot" define changer robot { tpchanger "chg-robot:/dev/changer" changerfile "/etc/amanda//changer-state" property "tape-device" "0=tape:/dev/nst0" property "eject-before-unload" "no" property "use-slots" "1-100" property "load-poll" "0s poll 3s until 120s" } amrecover_changer "/dev/changer" and /dev/changer is a sym link to /dev/sg11 So let me explain what is /dev/sg11 ... this is the device that allows the manipulation and see the tapes and drives of the robot, for example can be used with the command mtx $ mtx -f /dev/sg11 status Sure, thanks, I know what /dev/sg11 is ;-) in my case /dev/sg21 (although that seems to change with reboots sometimes .. another topic) The difference between your and my conf is only that symlink. I directly have tpchanger "chg-robot:/dev/sg21" and amrecover_changer "/dev/sg21" - Currently I have another amrecover running. It restored from tape1 .. and now I only see these lines in the current debug file "amidxtaped.20220525123652.debug": Wed May 25 14:48:25.078884308 2022: pid 705002: thd-0x556f690aca00: amidxtaped: /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 12472320 kb Wed May 25 14:48:40.090891256 2022: pid 705002: thd-0x556f690aca00: amidxtaped: /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 12472320 kb Wed May 25 14:48:55.102880465 2022: pid 705002: thd-0x556f690aca00: amidxtaped: /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 12472320 kb ... for hours now. Is that OK? Maybe tar still "scans" through that first tarball on tape ... ?
Re: amrecover usage with chg-robot
On Wed, May 25, 2022 at 17:09:22 +0200, Stefan G. Weichinger wrote: > > So to me it looks that my dumptype with both compression and > encryption is the problem. > > I use the script provided by Anton "exuvo" Olsson, he shared it in > earlier threads here. > > The current iteration on this server: > > https://dpaste.org/2YrkJ > > Maybe it hasn't yet been tested with amrecover from multiple tapes? > > Or the combination with gzip is a problem. I haven't used encryption with Amanda so I don't have anything specific to suggest. Off hand I don't see anything obviously incorrect with that script (Well... in the encrypt-operation case it writes the contents of "$@" to /tmp/encryptparams file but that file doesn't ever appear to be referenced... but that parameters don't appear to be referenced in the decrypt-operation case either, so I don't expect that aspect of the script is related to your problem.) My next step would be to investigate the status of the subprocesses during the period where amrecover seems to be hung, i.e. using ps, lsof, and strace. I'm guessing the "openssl enc -d" process doesn't exit for some reason; can you identify what it's trying to do or waiting for? If you see that process out there but just sleeping (i.e. using no CPU, and strace shows it's just stuck waiting in a "read" syscall or something)), what happens if you manually kill the process (i.e. does amrecover the proceed to its next step)? Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amrecover usage with chg-robot
Am 25.05.22 um 16:46 schrieb Stefan G. Weichinger: The tested DLE is encrypted and compressed. Yes, redundant ... I can change that in the future. The first part gets read perfectly, but it seems that the end of the tarball crashes things somehow. I remember that error from mailing list threads .. I will research later when I am back at my office. Testing with a DLE that is/was not encrypted. Plain tar, no compression. amrecover with device "/dev/sg21" works great. I have to confirm the tape change after the first files have been extracted, then the changer loads the next tape and the next restore step happens. Great. - So to me it looks that my dumptype with both compression and encryption is the problem. I use the script provided by Anton "exuvo" Olsson, he shared it in earlier threads here. The current iteration on this server: https://dpaste.org/2YrkJ Maybe it hasn't yet been tested with amrecover from multiple tapes? Or the combination with gzip is a problem. I'll be happy to share any more debug logs or so.
Re: amrecover usage with chg-robot
Am 25.05.22 um 16:12 schrieb Nathan Stratton Treadway: When doing an extract tar does read on to the end of the tar file before exiting, but "hours and hours" seems like a long time to wait for that... Is tar still running (e.g. what does "top" or "ps" show)? If so, what does strace on the tar process show? Do any other amanda (sub)processes exist on the system at this time? I reran that now and have to add some more information: That size value increases up to the mentioned 12472320 kb ... and until then tar is running. Then the tar file seems to end, and amrecover says: /bin/gzip: /bin/gzip: gzip: stdin: decompression OK, trailing garbage ignored tar has stopped then,also no more gzip process visible. The tested DLE is encrypted and compressed. Yes, redundant ... I can change that in the future. The first part gets read perfectly, but it seems that the end of the tarball crashes things somehow. I remember that error from mailing list threads .. I will research later when I am back at my office.
Re: amrecover usage with chg-robot
On Wed, May 25, 2022 at 14:50:00 +0200, Stefan G. Weichinger wrote: > Currently I have another amrecover running. It restored from tape1 > .. and now I only see these lines in the current debug file > "amidxtaped.20220525123652.debug": > > Wed May 25 14:48:25.078884308 2022: pid 705002: thd-0x556f690aca00: > amidxtaped: > /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 > 12472320 kb > Wed May 25 14:48:40.090891256 2022: pid 705002: thd-0x556f690aca00: > amidxtaped: > /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 > 12472320 kb > Wed May 25 14:48:55.102880465 2022: pid 705002: thd-0x556f690aca00: > amidxtaped: > /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490 > 12472320 kb > > ... for hours now. > Is that OK? Maybe tar still "scans" through that first tarball on tape ... ? When doing an extract tar does read on to the end of the tar file before exiting, but "hours and hours" seems like a long time to wait for that... Is tar still running (e.g. what does "top" or "ps" show)? If so, what does strace on the tar process show? Do any other amanda (sub)processes exist on the system at this time? Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: amrecover usage with chg-robot
Am 25.05.22 um 15:35 schrieb Nuno Dias: That messages seems ok, but usually the "kb" will increase with the time, the same "kb" everytime seems a little strange, unfortunately I can't help more than this! I think you need someone that understand more of amanda. Never mind, thanks for providing your config etc I also expect that value to increase ... maybe someone comes up with some more details on all this. For now I stop my amrecover-test-run. It should have succeeded hours ago.
Re: amrecover usage with chg-robot
That messages seems ok, but usually the "kb" will increase with
the time, the same "kb" everytime seems a little strange,
unfortunately I can't help more than this! I think you need
someone that understand more of amanda.
cheers,
Nuno
On Wed, 2022-05-25 at 14:50 +0200, Stefan G. Weichinger wrote:
>
> Am 25.05.22 um 14:41 schrieb Nuno Dias:
> > My conf (the relevant parts)
> >
> > tpchanger "robot"
> >
> > define changer robot {
> > tpchanger "chg-robot:/dev/changer"
> > changerfile "/etc/amanda//changer-state"
> > property "tape-device" "0=tape:/dev/nst0"
> > property "eject-before-unload" "no"
> > property "use-slots" "1-100"
> > property "load-poll" "0s poll 3s until 120s"
> > }
> >
> > amrecover_changer "/dev/changer"
> >
> > and /dev/changer is a sym link to /dev/sg11
> >
> > So let me explain what is /dev/sg11 ... this is the device
> > that
> > allows the manipulation and see the tapes and drives of the
> > robot, for example can be used with the command mtx
> >
> > $ mtx -f /dev/sg11 status
>
> Sure, thanks, I know what /dev/sg11 is ;-) in my case
> /dev/sg21
>
> (although that seems to change with reboots sometimes ..
> another topic)
>
> The difference between your and my conf is only that symlink.
>
> I directly have
>
> tpchanger "chg-robot:/dev/sg21"
>
> and
>
> amrecover_changer "/dev/sg21"
>
> -
>
> Currently I have another amrecover running. It restored from
> tape1 ..
> and now I only see these lines in the current debug file
> "amidxtaped.20220525123652.debug":
>
> Wed May 25 14:48:25.078884308 2022: pid 705002: thd-
> 0x556f690aca00:
> amidxtaped:
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:4
> 90
> 12472320 kb
> Wed May 25 14:48:40.090891256 2022: pid 705002: thd-
> 0x556f690aca00:
> amidxtaped:
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:4
> 90
> 12472320 kb
> Wed May 25 14:48:55.102880465 2022: pid 705002: thd-
> 0x556f690aca00:
> amidxtaped:
> /usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:4
> 90
> 12472320 kb
>
> ... for hours now.
>
> Is that OK? Maybe tar still "scans" through that first tarball
> on tape ... ?
>
--
Nuno Dias
LIP
Re: amrecover usage with chg-robot
Am 25.05.22 um 14:41 schrieb Nuno Dias:
My conf (the relevant parts)
tpchanger "robot"
define changer robot {
tpchanger "chg-robot:/dev/changer"
changerfile "/etc/amanda//changer-state"
property"tape-device" "0=tape:/dev/nst0"
property "eject-before-unload" "no"
property "use-slots" "1-100"
property "load-poll" "0s poll 3s until 120s"
}
amrecover_changer "/dev/changer"
and /dev/changer is a sym link to /dev/sg11
So let me explain what is /dev/sg11 ... this is the device that
allows the manipulation and see the tapes and drives of the
robot, for example can be used with the command mtx
$ mtx -f /dev/sg11 status
Sure, thanks, I know what /dev/sg11 is ;-) in my case /dev/sg21
(although that seems to change with reboots sometimes .. another topic)
The difference between your and my conf is only that symlink.
I directly have
tpchanger "chg-robot:/dev/sg21"
and
amrecover_changer "/dev/sg21"
-
Currently I have another amrecover running. It restored from tape1 ..
and now I only see these lines in the current debug file
"amidxtaped.20220525123652.debug":
Wed May 25 14:48:25.078884308 2022: pid 705002: thd-0x556f690aca00:
amidxtaped:
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490
12472320 kb
Wed May 25 14:48:40.090891256 2022: pid 705002: thd-0x556f690aca00:
amidxtaped:
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490
12472320 kb
Wed May 25 14:48:55.102880465 2022: pid 705002: thd-0x556f690aca00:
amidxtaped:
/usr/lib64/perl5/vendor_perl/5.32/Amanda/Restore.pm:1719:info:490
12472320 kb
... for hours now.
Is that OK? Maybe tar still "scans" through that first tarball on tape ... ?
Re: amrecover usage with chg-robot
My conf (the relevant parts)
tpchanger "robot"
define changer robot {
tpchanger "chg-robot:/dev/changer"
changerfile "/etc/amanda//changer-state"
property"tape-device" "0=tape:/dev/nst0"
property "eject-before-unload" "no"
property "use-slots" "1-100"
property "load-poll" "0s poll 3s until 120s"
}
amrecover_changer "/dev/changer"
and /dev/changer is a sym link to /dev/sg11
So let me explain what is /dev/sg11 ... this is the device that
allows the manipulation and see the tapes and drives of the
robot, for example can be used with the command mtx
$ mtx -f /dev/sg11 status
Cheers,
Nuno
On Wed, 2022-05-25 at 12:35 +0200, Stefan G. Weichinger wrote:
> Am 25.05.22 um 12:17 schrieb Nuno Dias:
> >
> > I have "tapedev" in /etc/amanda/amanda-client.conf but is
> > commented, maybe was used a long time ago, but right now is
> > not
> > necessary with my configuration.
> >
> > And yes I do not choose any device or changer when I
> > execute
> > amrecover.
>
> OK, great.
>
> My amrecover still wants to use /dev/nst0, even after setting
> "amrecover_changer"
>
> I can edit it, no problem, but I try to configure it as good as
> possible
> ... and write a decent howto.
>
> Setting "robot" as device in amrecover doesn't work.
>
> What do you have in "tpchanger" ?
>
> pls share the relevant blocks
>
>
>
>
>
>
>
>
--
Nuno Dias
LIP
Re: amrecover usage with chg-robot
Am 25.05.22 um 12:17 schrieb Nuno Dias: I have "tapedev" in /etc/amanda/amanda-client.conf but is commented, maybe was used a long time ago, but right now is not necessary with my configuration. And yes I do not choose any device or changer when I execute amrecover. OK, great. My amrecover still wants to use /dev/nst0, even after setting "amrecover_changer" I can edit it, no problem, but I try to configure it as good as possible ... and write a decent howto. Setting "robot" as device in amrecover doesn't work. What do you have in "tpchanger" ? pls share the relevant blocks
Re: amrecover usage with chg-robot
I have "tapedev" in /etc/amanda/amanda-client.conf but is commented, maybe was used a long time ago, but right now is not necessary with my configuration. And yes I do not choose any device or changer when I execute amrecover. Cheers, NUno On Wed, 2022-05-25 at 11:58 +0200, Stefan G. Weichinger wrote: > Am 25.05.22 um 11:50 schrieb Nuno Dias: > > Hi, > > > > I have a similar configuration and works as expected. > > The only thing different is > > > > amrecover_changer "/dev/changer" > > > > where /dev/changer is a sym link to /dev/sg11 (The Robot) > > > > When I recover something I do not define anything and the > > changer and tapes devices are obtain in the CONF of > > amanda.conf > > > > I only execute this as root and the recover works > > > > $ amrecover CONF -s amanda -t amanda > > Thanks, Nuno. > > Do you have a "tapedev" in /etc/amanda/amanda-client.conf ? > > That one also determines how amrecover behaves. > > So you don't choose any device within amrecover, right? > -- Nuno Dias LIP
Re: amrecover usage with chg-robot
Hi,
I have a similar configuration and works as expected.
The only thing different is
amrecover_changer "/dev/changer"
where /dev/changer is a sym link to /dev/sg11 (The Robot)
When I recover something I do not define anything and the
changer and tapes devices are obtain in the CONF of amanda.conf
I only execute this as root and the recover works
$ amrecover CONF -s amanda -t amanda
Cheers,
Nuno
On Wed, 2022-05-25 at 09:07 +0200, Stefan G. Weichinger wrote:
>
>
> At a customer I decided to do some recovery testing.
>
> We use the chg-robot changer there, with one tape drive.
>
> See below:
>
> * tapetype
> * changer
> * taperscan
> * storage
> * amrecover_changer
>
> (I could also show the whole config, but don't want to flood
> the list ...)
>
> Amdumps work fine.
> Recovery ... I have questions:
>
> I decided to restore a DLE with a lev1 on one tape and the lev0
> on
> another tape, to test the whole handling of 2 or more tapes.
>
> That lead to amrecover reading the lev1 tarball from tape 1,
> and then
> hours of waiting ... it never got to the point of requesting
> tape 2 or
> changing it on its own.
>
> I read the manpages etc, and I am unsure if to use "robot" (=
> changer
> name) as tape device in amrecover, or "tape:/dev/nst0"
> directly.
>
> tested both, no difference
>
> Maybe the tape drive doesn't return some of those EOM/EOT
> messages?
>
> It's a HP Ultrium 4-SCSI, Rev: B12H ... as far as I remember
> that worked
> before.
>
> Thanks for any hints here.
>
> (ah, for reference: Amanda-3.5.1 on Debian 11.3 right now)
>
> ---
>
> here the configs:
>
>
> define tapetype LTO-4 {
> │
> comment "Created by amtapetype; compression disabled;
> 2017-10-31 sgw" │
> length 698510208 kbytes
> │
> filemark 0 kbytes
> │
> speed 36696 kps
> │
> blocksize 2 mbytes
> │
> readblocksize 2 mbytes
> │
> }
> │
>
> │
> define changer robot {
> │
> tpchanger "chg-robot:/dev/sg21" # lsscsi -g
> │
> property "tape-device" "0=tape:/dev/nst0"
> │
> property "eject-before-unload" "no"
> │
> property "use-slots" "1-24"
> │
> changerfile "/etc/amanda/abt/chg-robot-dev-sg21"
> │
> }
> │
>
> │
> define taperscan lexi {
> │
> comment "none"
> │
> plugin "lexical"
> │
> }
> │
>
> │
> define storage abt {
> │
> tapepool "abt"
> │
> tapetype "LTO-4"
> │
> tpchanger "robot"
> │
> runtapes 1
> │
> taperscan "lexi"
> │
>
> │
> flush-threshold-dumped 300
> │
> flush-threshold-scheduled 300
> │
> taperflush 300
> │
> autoflush yes
> │
>
> │
> #labelstr "^ABT-[0-9][0-9]*$"
> │
> autolabel "ABT-%b"
> │
> }
> │
>
> │
> reserve 30
> │
>
> │
> amrecover_changer "robot"
>
--
Nuno Dias
LIP
amrecover usage with chg-robot
At a customer I decided to do some recovery testing.
We use the chg-robot changer there, with one tape drive.
See below:
* tapetype
* changer
* taperscan
* storage
* amrecover_changer
(I could also show the whole config, but don't want to flood the list ...)
Amdumps work fine.
Recovery ... I have questions:
I decided to restore a DLE with a lev1 on one tape and the lev0 on
another tape, to test the whole handling of 2 or more tapes.
That lead to amrecover reading the lev1 tarball from tape 1, and then
hours of waiting ... it never got to the point of requesting tape 2 or
changing it on its own.
I read the manpages etc, and I am unsure if to use "robot" (= changer
name) as tape device in amrecover, or "tape:/dev/nst0" directly.
tested both, no difference
Maybe the tape drive doesn't return some of those EOM/EOT messages?
It's a HP Ultrium 4-SCSI, Rev: B12H ... as far as I remember that worked
before.
Thanks for any hints here.
(ah, for reference: Amanda-3.5.1 on Debian 11.3 right now)
---
here the configs:
define tapetype LTO-4 {
│
comment "Created by amtapetype; compression disabled;
2017-10-31 sgw"│
length 698510208 kbytes
│
filemark 0 kbytes
│
speed 36696 kps
│
blocksize 2 mbytes
│
readblocksize 2 mbytes
│
}
│
│
define changer robot {
│
tpchanger "chg-robot:/dev/sg21" # lsscsi -g
│
property "tape-device" "0=tape:/dev/nst0"
│
property "eject-before-unload" "no"
│
property "use-slots" "1-24"
│
changerfile "/etc/amanda/abt/chg-robot-dev-sg21"
│
}
│
│
define taperscan lexi {
│
comment "none"
│
plugin "lexical"
│
}
│
│
define storage abt {
│
tapepool "abt"
│
tapetype "LTO-4"
│
tpchanger "robot"
│
runtapes 1
│
taperscan "lexi"
│
│
flush-threshold-dumped 300
│
flush-threshold-scheduled 300
│
taperflush 300
│
autoflush yes
│
│
#labelstr "^ABT-[0-9][0-9]*$"
│
autolabel "ABT-%b"
│
}
│
│
reserve 30
│
│
amrecover_changer "robot"
