Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Wed, Oct 14, 2020 at 13:20:38 -0400, Chris Hoogendyk wrote: > I've been doing all the debugging stuff one one server. On my other > server, I had simply set it to traditional and it's been working. > Just now I went and applied ScanInventory.pm.patch_20201013C and > changed the amanda.conf back to using oldest. An amcheck daily told > me there were no acceptable volumes found! So I switched back to Hmmm... Well, clearly the exact failure mode different between the two servers :(. (You did previously see the "terminated with signal 11" error message on the eclogite server, right?) Did the dump that ran yesterday/last night actually write to the geo-daily-065 tape successfully, or was there some sort of changer error at run time? I guess the next thing to try would be to install ScanInventory.pm.patch_20201013B to enable some debugging, then run "amcheck -s" and "amtape ... taper" again and post the results. Those tests should be done using the oldest taperscan, but you can leave the amanda.conf as-is and test with -otaperscan on the command lines if you prefer. Also, since I assume the statefile has changed since you last posted it, I guess you should include it again (i.e. the version that is out then at the moemnt you are running those tests). Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
From the other server:
amandabackup@eclogite:~/daily$ cat
/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64100589-LL0
$STATE = {
'drive_lru' => [
'0',
'1'
],
'last_status' => 1602694135,
'last_operation_delay' => 0,
'last_operation_time' => 1602694089,
'slots' => {
'3' => {
'loaded_in' => undef,
'label' => 'geo-daily-062',
'state' => 1,
'ie' => 0,
'f_type' => bless( {
'sign' => '+',
'value' => [
1
]
}, 'Math::BigInt' ),
'device_error' => undef,
'barcode' => '62L7',
'device_status' => bless( {
'value' => [
0
],
'sign' => '+'
}, 'Math::BigInt' )
},
'4' => {
'device_status' => bless( {
'sign' => '+',
'value' => [
0
]
}, 'Math::BigInt' ),
'barcode' => '63L7',
'device_error' => undef,
'f_type' => bless( {
'value' => [
1
],
'sign' => '+'
}, 'Math::BigInt' ),
'ie' => 0,
'state' => 1,
'loaded_in' => undef,
'label' => 'geo-daily-063'
},
'17' => {
'device_status' => bless( {
'value' => [
4
],
'sign' => '+'
}, 'Math::BigInt' ),
'barcode' => '76L7',
'device_error' => 'Tape device /dev/nst1
is not ready or is
empty',
'ie' => 0,
'f_type' => undef,
'state' => 1,
'loaded_in' => undef,
'label' => undef
},
'16' => {
'barcode' => '75L7',
'device_status' => bless( {
'sign' => '+',
'value' => [
4
]
}, 'Math::BigInt' ),
'device_error' => 'Tape device /dev/nst1
is not ready or is
empty',
'ie' => 0,
'f_type' => undef,
'state' => 1,
'loaded_in' => undef,
'label' => undef
},
'10' => {
'ie' => 0,
'f_type' => undef,
'state' => 1,
'loaded_in' => undef,
'label' => undef,
'barcode' => '69L7',
'device_status' => bless( {
'sign' => '+',
'value' => [
4
]
}, 'Math::BigInt' ),
'device_error' => 'Tape device /dev/nst1
is not ready or is
empty'
},
'13' => {
'device_error' => 'Tape device /dev/nst1
is not ready or is
empty',
'device_status' => bless( {
'value' => [
4
],
'sign' => '+'
}, 'Math::BigInt' ),
'barcode' => '72L7',
'loaded_in' => undef,
'label' => undef,
'f_type' => undef,
'ie' => 0,
'state' => 1
},
'7' => {
'state' => 1,
'ie' => 0,
'f_type' => undef,
'loaded_in' => undef,
'label' => undef,
'barcode' => '66L7',
'device_status' => bless( {
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Wed, Oct 14, 2020 at 12:20:33 -0400, Chris Hoogendyk wrote: > Both lexical.pm and ScanInventory.pm restored to original. New fix > only patch applied to ScanInventory.pm. amanda.conf restored to use > oldest. > >amanda@marlin:~/daily$ amcheck daily > >Amanda Tape Server Host Check >- >NOTE: Holding disk '/amanda3': 139784192 KB disk space available, > using 34926592 KB >NOTE: Holding disk '/amanda4': 170082304 KB disk space available, > using 65224704 KB >NOTE: Holding disk '/amanda5': 240713728 KB disk space available, > using 135856128 KB > * Authorized Use Only * > >snapper >slot 25: volume 'Bio-Research-028' >Will write to volume 'Bio-Research-028' in slot 25. >NOTE: skipping tape-writable test >Server check took 50.059 seconds >Amanda Backup Client Hosts Check > >Client check: 4 hosts checked in 6.692 seconds. 0 problems found. >(brought to you by Amanda 3.5.1) > >amanda@marlin:~/daily$ [...]> > > Launched a flush on that. Then the following seems to set up a tape on the > second tape drive. > >amanda@marlin:~/daily$ amtape daily taper > >slot 31: volume 'Bio-Research-032' >Will write to volume 'Bio-Research-032' in slot 31. > >amanda@marlin:~/daily$ Okay, sounds like things are back to working "normally" on that server, right? So, do you still have a second server which is getting coredumps (at least with the oldest taperscan)? Based on the investigation so far, it seems like the crash is caused by tape-inventory records which have no label text along with some specific other data field values. If you post the /usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi* changer state file that other server, we can double check that such entries exist over there, too. (Assume they do, then I guess the question will be whether you want to apply the same ScanInventory.pm patch there, or if you instead want to try clearing that/those bad inventory record(s) without changing the installed code on that box) Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
Both lexical.pm and ScanInventory.pm restored to original. New fix only patch applied to ScanInventory.pm. amanda.conf restored to use oldest. amanda@marlin:~/daily$ amcheck daily Amanda Tape Server Host Check - NOTE: Holding disk '/amanda3': 139784192 KB disk space available, using 34926592 KB NOTE: Holding disk '/amanda4': 170082304 KB disk space available, using 65224704 KB NOTE: Holding disk '/amanda5': 240713728 KB disk space available, using 135856128 KB * Authorized Use Only * snapper slot 25: volume 'Bio-Research-028' Will write to volume 'Bio-Research-028' in slot 25. NOTE: skipping tape-writable test Server check took 50.059 seconds Amanda Backup Client Hosts Check Client check: 4 hosts checked in 6.692 seconds. 0 problems found. (brought to you by Amanda 3.5.1) amanda@marlin:~/daily$ Contents of /tmp/amanda/server/daily/amcheck-device.20201014115111.debug Wed Oct 14 11:51:11.399537832 2020: pid 10048: thd-0x1bb4600: amcheck-device: pid 10048 ruid 555 euid 555 version 3.5.1: start at Wed Oct 14 11:51:11 2020 Wed Oct 14 11:51:11.399897845 2020: pid 10048: thd-0x1bb4600: amcheck-device: reading config file /usr/local/etc/amanda/daily/amanda.conf Wed Oct 14 11:51:11.429240795 2020: pid 10048: thd-0x1bb4600: amcheck-device: pid 10048 ruid 555 euid 555 version 3.5.1: rename at Wed Oct 14 11:51:11 2020 Wed Oct 14 11:51:11.457544084 2020: pid 10048: thd-0x1bb4600: amcheck-device: chg-robot: using statefile '/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0' Wed Oct 14 11:51:11.496418988 2020: pid 10048: thd-0x1bb4600: amcheck-device: invoking /usr/sbin/mtx -f /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 status Wed Oct 14 11:51:12.205067256 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Changer /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 Slots ( 0 Import/Export ) Wed Oct 14 11:51:12.205147868 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Data Transfer Element 0:Empty Wed Oct 14 11:51:12.205207448 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Data Transfer Element 1:Full (Storage Element 18 Loaded):VolumeTag = 06L7 Wed Oct 14 11:51:12.205264401 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 1:Empty Wed Oct 14 11:51:12.205318585 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 2:Empty Wed Oct 14 11:51:12.205360601 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 3:Full :VolumeTag=15L7 Wed Oct 14 11:51:12.205414201 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 4:Full :VolumeTag=16L7 Wed Oct 14 11:51:12.205460521 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 5:Full :VolumeTag=17L7 Wed Oct 14 11:51:12.205505018 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 6:Full :VolumeTag=14L7 Wed Oct 14 11:51:12.205543248 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 7:Empty Wed Oct 14 11:51:12.205580955 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 8:Empty Wed Oct 14 11:51:12.205620300 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 9:Full :VolumeTag=21L7 Wed Oct 14 11:51:12.205665098 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 10:Full :VolumeTag=18L7 Wed Oct 14 11:51:12.205705828 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 11:Empty Wed Oct 14 11:51:12.205739917 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 12:Empty Wed Oct 14 11:51:12.205777202 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 13:Full :VolumeTag=01L7 Wed Oct 14 11:51:12.205817941 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 14:Full :VolumeTag=13L7 Wed Oct 14 11:51:12.205859124 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 15:Full :VolumeTag=03L7 Wed Oct 14 11:51:12.205899807 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 16:Full :VolumeTag=04L7 Wed Oct 14 11:51:12.205945040 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 17:Full :VolumeTag=05L7 Wed Oct 14 11:51:12.205985155 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 18:Empty Wed Oct 14 11:51:12.206066315 2020: pid 10048: thd-0x1bb4600: amcheck-device: mtx: Storage Element 19:Full :VolumeTag=07L7 Wed Oct 14 11:51:12.206109137 2020: pid 10048: thd-0x1bb4600:
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Tue, Oct 13, 2020 at 21:02:34 -0400, Chris Hoogendyk wrote: > Patched. > >amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily > >Amanda Tape Server Host Check >- >NOTE: Holding disk '/amanda3': 449998848 KB disk space available, > using 345141248 KB >NOTE: Holding disk '/amanda4': 3026923520 KB disk space available, > using 2922065920 KB >NOTE: Holding disk '/amanda5': 104857600 KB disk space available, > using 0 KB >slot 19: volume 'Bio-Research-007' >Will write to volume 'Bio-Research-007' in slot 19. >NOTE: skipping tape-writable test >Server check took 20.422 seconds >(brought to you by Amanda 3.5.1) Okay, great. As I said before I'm not confident this is a competely correct fix, but attached here is a patch file (against the original version of the file) containing just the "fix" line (i.e. no debugging statements). You should be able to swap your lexical.pm back to the original version and put ScanInventory.pm back to the original with just this one patch, and then go ahead and switch back to "oldest" taperscan again in your amanda.conf -- hopefully it will all "just work" again. I'm curious to see the log file from an "amtape daily taper" run with that setup in place (I assume that will run to successful completion, too...). Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 --- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400 +++ ScanInventory.pm2020-10-13 22:43:25.148507391 -0400 @@ -723,6 +723,7 @@ return 0; } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and $f_type == $Amanda::Header::F_TAPESTART) { + $label='' if !defined $label; if (!match_labelstr($self->{'labelstr'}, $autolabel, $label, $barcode, $meta, $self->{'chg'}->{'storage'}->{'storage_name'})) { if (!$autolabel->{'other_config'}) {
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
Patched. amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily Amanda Tape Server Host Check - NOTE: Holding disk '/amanda3': 449998848 KB disk space available, using 345141248 KB NOTE: Holding disk '/amanda4': 3026923520 KB disk space available, using 2922065920 KB NOTE: Holding disk '/amanda5': 104857600 KB disk space available, using 0 KB slot 19: volume 'Bio-Research-007' Will write to volume 'Bio-Research-007' in slot 19. NOTE: skipping tape-writable test Server check took 20.422 seconds (brought to you by Amanda 3.5.1) amanda@marlin:~/daily$ Here is the whole /tmp/amanda/server/daily/amcheck-device.20201013205425.debug It's a little bit longer, going through the inventory more than once. Tue Oct 13 20:54:25.550354229 2020: pid 15972: thd-0x12d4800: amcheck-device: pid 15972 ruid 555 euid 555 version 3.5.1: start at Tue Oct 13 20:54:25 2020 Tue Oct 13 20:54:25.550685367 2020: pid 15972: thd-0x12d4800: amcheck-device: config_overrides: taperscan taper_lexical Tue Oct 13 20:54:25.550831476 2020: pid 15972: thd-0x12d4800: amcheck-device: reading config file /usr/local/etc/amanda/daily/amanda.conf Tue Oct 13 20:54:25.612658533 2020: pid 15972: thd-0x12d4800: amcheck-device: pid 15972 ruid 555 euid 555 version 3.5.1: rename at Tue Oct 13 20:54:25 2020 Tue Oct 13 20:54:25.643990167 2020: pid 15972: thd-0x12d4800: amcheck-device: chg-robot: using statefile '/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0' Tue Oct 13 20:54:25.662486365 2020: pid 15972: thd-0x12d4800: amcheck-device: invoking /usr/sbin/mtx -f /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 status Tue Oct 13 20:54:26.295441930 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Changer /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 Slots ( 0 Import/Export ) Tue Oct 13 20:54:26.314120367 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Data Transfer Element 0:Full (Storage Element 19 Loaded):VolumeTag = 07L7 Tue Oct 13 20:54:26.314231179 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Data Transfer Element 1:Full (Storage Element 18 Loaded):VolumeTag = 06L7 Tue Oct 13 20:54:26.314288072 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 1:Empty Tue Oct 13 20:54:26.314346671 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 2:Empty Tue Oct 13 20:54:26.314397525 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 3:Full :VolumeTag=15L7 Tue Oct 13 20:54:26.314459720 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 4:Full :VolumeTag=16L7 Tue Oct 13 20:54:26.314512714 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 5:Full :VolumeTag=17L7 Tue Oct 13 20:54:26.314571527 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 6:Full :VolumeTag=14L7 Tue Oct 13 20:54:26.314619709 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 7:Empty Tue Oct 13 20:54:26.314662419 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 8:Empty Tue Oct 13 20:54:26.314713070 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 9:Full :VolumeTag=21L7 Tue Oct 13 20:54:26.314767855 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 10:Full :VolumeTag=18L7 Tue Oct 13 20:54:26.314815482 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 11:Empty Tue Oct 13 20:54:26.314861223 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 12:Empty Tue Oct 13 20:54:26.314907435 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 13:Full :VolumeTag=01L7 Tue Oct 13 20:54:26.314961246 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 14:Full :VolumeTag=13L7 Tue Oct 13 20:54:26.315011854 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 15:Full :VolumeTag=03L7 Tue Oct 13 20:54:26.315061469 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 16:Full :VolumeTag=04L7 Tue Oct 13 20:54:26.315112449 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 17:Full :VolumeTag=05L7 Tue Oct 13 20:54:26.315158231 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 18:Empty Tue Oct 13 20:54:26.315198333 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 19:Empty Tue Oct 13 20:54:26.315245593 2020: pid 15972: thd-0x12d4800: amcheck-device: mtx: Storage Element 20:Full :VolumeTag=08L7 Tue Oct 13 20:
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Tue, Oct 13, 2020 at 15:08:43 -0400, Chris Hoogendyk wrote:
>
> End of /tmp/amanda/server/daily/amcheck-device.20201013150303.debug
[...]
>Tue Oct 13 15:03:04.326400287 2020: pid 10777: thd-0x2688800:
> amcheck-device: warning: Use of
>uninitialized value $label in concatenation (.) or string at
>/usr/local/share/perl/5.22.1/Amanda/ScanInventory.pm line 687.
>Tue Oct 13 15:03:04.326421243 2020: pid 10777: thd-0x2688800:
> amcheck-device:
>volume_is_labelable start: label: barcode: 29L7
>Tue Oct 13 15:03:04.326444683 2020: pid 10777: thd-0x2688800:
> amcheck-device:
>volume_is_labelable pre-matchlabel call
Okay, great, this would seem to confirm the theory that passing an
uninitialed $label value into the match_labelstr() function is what's
triggering the crash.
Here's a new patch to try. I am not sure that it's really the
long-term-correct fix, but with some luck it will at least prevent the
crash you are currrently seeing and let you switch back to the oldest
taperscan.
You can either apply this patch file against the *original* version of
ScanInventory.pm, or just manually edit the previously-patched version
of the file to add the
$label='' if !defined $label;
just below the
debug("volume_is_labelable pre-matchlabel call");
line that appears in there now.
Given that a shot (and send the log file lines as usual)...
Nathan
Nathan Stratton Treadway - [email protected] - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
--- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400
+++ ScanInventory.pm2020-10-13 15:47:36.242054712 -0400
@@ -684,6 +684,7 @@
my $chg = $self->{'chg'};
my $autolabel = $chg->{'autolabel'};
+debug("volume_is_labelable start: label: $label barcode: $barcode");
if (!defined $dev_status) {
return 0;
} elsif ($dev_status & $DEVICE_STATUS_VOLUME_UNLABELED and
@@ -723,8 +724,11 @@
return 0;
} elsif ($dev_status == $DEVICE_STATUS_SUCCESS and
$f_type == $Amanda::Header::F_TAPESTART) {
+debug("volume_is_labelable pre-matchlabel call");
+$label='' if !defined $label;
if (!match_labelstr($self->{'labelstr'}, $autolabel, $label,
$barcode, $meta,
$self->{'chg'}->{'storage'}->{'storage_name'})) {
+debug("volume_is_labelable post-matchlabel call");
if (!$autolabel->{'other_config'}) {
# $self->_user_msg(slot_result => 1,
# label=> $label,
@@ -734,7 +738,9 @@
return 0;
}
} else {
+debug("volume_is_labelable pre-lookup_tapelabel call");
my $vol_tle = $self->{'tapelist'}->lookup_tapelabel($label);
+debug("volume_is_labelable post-lookup_tapelabel call");
if (!$vol_tle) {
# $self->_user_msg(slot_result => 1,
# label => $label,
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
Patched. amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily Amanda Tape Server Host Check - NOTE: Holding disk '/amanda3': 449998848 KB disk space available, using 345141248 KB NOTE: Holding disk '/amanda4': 3026923520 KB disk space available, using 2922065920 KB NOTE: Holding disk '/amanda5': 104857600 KB disk space available, using 0 KB ERROR: amcheck-device terminated with signal 11 Server check took 1.883 seconds (brought to you by Amanda 3.5.1) amanda@marlin:~/daily$ End of /tmp/amanda/server/daily/amcheck-device.20201013150303.debug Tue Oct 13 15:03:04.314561876 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 33:Full :VolumeTag=36L7 Tue Oct 13 15:03:04.314617462 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 34:Full :VolumeTag=37L7 Tue Oct 13 15:03:04.314665487 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 35:Full :VolumeTag=38L7 Tue Oct 13 15:03:04.314711679 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 36:Full :VolumeTag=39L7 Tue Oct 13 15:03:04.314757827 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 37:Full :VolumeTag=40L7 Tue Oct 13 15:03:04.314803076 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 38:Full :VolumeTag=41L7 Tue Oct 13 15:03:04.314848723 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 39:Full :VolumeTag=42L7 Tue Oct 13 15:03:04.314890084 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 40:Empty Tue Oct 13 15:03:04.314928126 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 41:Empty Tue Oct 13 15:03:04.314965283 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 42:Empty Tue Oct 13 15:03:04.315007303 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 43:Full :VolumeTag=43L7 Tue Oct 13 15:03:04.315050910 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 44:Empty Tue Oct 13 15:03:04.315092078 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 45:Empty Tue Oct 13 15:03:04.315129499 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 46:Empty Tue Oct 13 15:03:04.315171329 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 47:Full :VolumeTag=CLN001CU Tue Oct 13 15:03:04.315218349 2020: pid 10777: thd-0x2688800: amcheck-device: mtx: Storage Element 48:Full :VolumeTag=CLN002CU Tue Oct 13 15:03:04.315378917 2020: pid 10777: thd-0x2688800: amcheck-device: NEO200x48: updating state Tue Oct 13 15:03:04.322889915 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 0 label: Bio-Research-013ds: 0 Tue Oct 13 15:03:04.324824228 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 1 label: Bio-Research-003ds: 0 Tue Oct 13 15:03:04.324975904 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 2 label: Bio-Research-004ds: 0 Tue Oct 13 15:03:04.325194015 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 3 label: Bio-Research-005ds: 0 Tue Oct 13 15:03:04.325323140 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 4 label: Bio-Research-006ds: 0 Tue Oct 13 15:03:04.325481141 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 5 label: Bio-Research-007ds: 0 Tue Oct 13 15:03:04.325672484 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 6 label: Bio-Research-008ds: 0 Tue Oct 13 15:03:04.325764902 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 7 label: Bio-Research-009ds: 0 Tue Oct 13 15:03:04.325915539 2020: pid 10777: thd-0x2688800: amcheck-device: warning: Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. Tue Oct 13 15:03:04.325943596 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 8 label: Bio-Research-010ds: Tue Oct 13 15:03:04.326044285 2020: pid 10777: thd-0x2688800: amcheck-device: volume_is_labelable start: label: Bio-Research-010 barcode: 10L7 Tue Oct 13 15:03:04.326096109 2020: pid 10777: thd-0x2688800: amcheck-device: warning: Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. Tue Oct 13 15:03:04.326128079 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 9 label: Bio-Research-011ds: Tue Oct 13 15:03:04.326151100 2020: pid 10777: thd-0x2688800: amcheck-device: volume_is_labelable start: label: Bio-Research-011 barcode: 11L7 Tue Oct 13 15:03:04.326172195 2020: pid 10777: thd-0x2688800: amcheck-device: slot: 10 label: Bio-Research-012
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Mon, Oct 12, 2020 at 22:22:45 -0400, Chris Hoogendyk wrote: >Mon Oct 12 22:16:21.857347044 2020: pid 23996: thd-0x25c0800: > amcheck-device: slot: 9 label: >Bio-Research-011ds: >Mon Oct 12 22:16:21.857380544 2020: pid 23996: thd-0x25c0800: > amcheck-device: slot: 10 label: >Bio-Research-012ds: 0 >Mon Oct 12 22:16:21.857471810 2020: pid 23996: thd-0x25c0800: > amcheck-device: slot: 11 label: >Bio-Research-028ds: 0 >Mon Oct 12 22:16:21.857580226 2020: pid 23996: thd-0x25c0800: > amcheck-device: warning: Use of >uninitialized value in concatenation (.) or string at >/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. >Mon Oct 12 22:16:21.857607369 2020: pid 23996: thd-0x25c0800: > amcheck-device: slot: 12 label: ds: 0 Progress! This shows that the crash happens during processing of slot 12. (Looking back through the output of "amtape inventory" you sent, it appears that this slot contains a tape with barcode 29L7.) An interesting thing to note is that the *label* variable for that slot is uninitialized -- perhaps that's what is causing the crash? To test that theory a bit, I've attached another patch to try. Unfortunately this one is in a file used by the oldest.pm algorithm, too, so you'll probably want to revert the file back to the original as soon as you've finished teating, to make sure that the patched version doesn't affect an actual amanda run. So, basically save a copy of the original /usr/local/share/perl/5.22.1/Amanda/ScanInventory.pm file, then apply the patch attached to this email in-place to that file, and run the amcheck -s -otaperscan=taper_lexical daily test again. (I don't expect this patch to prevent the crash, but hopefully the new log messages will narrow down exactly where it is crashing.) Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 --- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400 +++ ScanInventory.pm2020-10-13 00:53:04.662721210 -0400 @@ -684,6 +684,7 @@ my $chg = $self->{'chg'}; my $autolabel = $chg->{'autolabel'}; +debug("volume_is_labelable start: label: $label barcode: $barcode"); if (!defined $dev_status) { return 0; } elsif ($dev_status & $DEVICE_STATUS_VOLUME_UNLABELED and @@ -723,8 +724,10 @@ return 0; } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and $f_type == $Amanda::Header::F_TAPESTART) { +debug("volume_is_labelable pre-matchlabel call"); if (!match_labelstr($self->{'labelstr'}, $autolabel, $label, $barcode, $meta, $self->{'chg'}->{'storage'}->{'storage_name'})) { +debug("volume_is_labelable post-matchlabel call"); if (!$autolabel->{'other_config'}) { # $self->_user_msg(slot_result => 1, # label=> $label, @@ -734,7 +737,9 @@ return 0; } } else { +debug("volume_is_labelable pre-lookup_tapelabel call"); my $vol_tle = $self->{'tapelist'}->lookup_tapelabel($label); +debug("volume_is_labelable post-lookup_tapelabel call"); if (!$vol_tle) { # $self->_user_msg(slot_result => 1, # label => $label,
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
amanda@marlin:~/daily$ amtape -otaperscan=taper_oldest daily inventory slot 14: label Bio-Research-013 barcode 13L7 [retention-tapes] slot 15: label Bio-Research-003 barcode 03L7 [retention-tapes] slot 16: label Bio-Research-004 barcode 04L7 (in drive 1) [retention-tapes] slot 17: label Bio-Research-005 barcode 05L7 reserved (in drive 0) (current) [retention-tapes] slot 18: label Bio-Research-006 barcode 06L7 [retention-no] slot 19: label Bio-Research-007 barcode 07L7 [retention-no] slot 20: label Bio-Research-008 barcode 08L7 [retention-no] slot 21: label Bio-Research-009 barcode 09L7 [retention-no] slot 22: label Bio-Research-010 barcode 10L7 [retention-no] slot 23: label Bio-Research-011 barcode 11L7 [retention-no] slot 24: label Bio-Research-012 barcode 12L7 [retention-no] slot 25: label Bio-Research-028 barcode 28L7 [retention-no] slot 26: unknown barcode 29L7 slot 27: unknown barcode 30L7 slot 28: label Bio-Research-029 barcode 31L7 [retention-no] slot 29: label Bio-Research-030 barcode 32L7 [retention-no] slot 30: label Bio-Research-031 barcode 33L7 [retention-no] slot 31: label Bio-Research-032 barcode 34L7 [retention-no] slot 32: label Bio-Research-033 barcode 35L7 [retention-no] slot 33: label Bio-Research-034 barcode 36L7 [retention-no] slot 34: label Bio-Research-035 barcode 37L7 [retention-no] slot 35: unknown barcode 38L7 slot 36: unknown barcode 39L7 amanda@marlin:~/daily$ contents of /tmp/amanda/server/daily/amtape.2020101714.debug Mon Oct 12 22:27:14.283528244 2020: pid 25817: thd-0x26d2000: amtape: pid 25817 ruid 555 euid 555 version 3.5.1: start at Mon Oct 12 22:27:14 2020 Mon Oct 12 22:27:14.283638808 2020: pid 25817: thd-0x26d2000: amtape: Arguments: -otaperscan=taper_oldest daily inventory Mon Oct 12 22:27:14.283985107 2020: pid 25817: thd-0x26d2000: amtape: config_overrides: taperscan taper_oldest Mon Oct 12 22:27:14.284166432 2020: pid 25817: thd-0x26d2000: amtape: reading config file /usr/local/etc/amanda/daily/amanda.conf Mon Oct 12 22:27:14.329169930 2020: pid 25817: thd-0x26d2000: amtape: pid 25817 ruid 555 euid 555 version 3.5.1: rename at Mon Oct 12 22:27:14 2020 Mon Oct 12 22:27:14.368005788 2020: pid 25817: thd-0x26d2000: amtape: chg-robot: using statefile '/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0' Mon Oct 12 22:27:14.370754856 2020: pid 25817: thd-0x26d2000: amtape: invoking /usr/sbin/mtx -f /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 status Mon Oct 12 22:27:15.015683640 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Changer /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 Slots ( 0 Import/Export ) Mon Oct 12 22:27:15.015801261 2020: pid 25817: thd-0x26d2000: amtape: mtx: Data Transfer Element 0:Full (Storage Element 17 Loaded):VolumeTag = 05L7 Mon Oct 12 22:27:15.015894617 2020: pid 25817: thd-0x26d2000: amtape: mtx: Data Transfer Element 1:Full (Storage Element 16 Loaded):VolumeTag = 04L7 Mon Oct 12 22:27:15.015950607 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 1:Empty Mon Oct 12 22:27:15.016009626 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 2:Empty Mon Oct 12 22:27:15.016061482 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 3:Full :VolumeTag=15L7 Mon Oct 12 22:27:15.016120910 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 4:Full :VolumeTag=16L7 Mon Oct 12 22:27:15.016175987 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 5:Full :VolumeTag=17L7 Mon Oct 12 22:27:15.016229741 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 6:Full :VolumeTag=14L7 Mon Oct 12 22:27:15.016275278 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 7:Empty Mon Oct 12 22:27:15.016332502 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 8:Empty Mon Oct 12 22:27:15.016384864 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 9:Full :VolumeTag=21L7 Mon Oct 12 22:27:15.016436706 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 10:Full :VolumeTag=18L7 Mon Oct 12 22:27:15.016483228 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 11:Empty Mon Oct 12 22:27:15.016524715 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 12:Empty Mon Oct 12 22:27:15.016570484 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storage Element 13:Full :VolumeTag=01L7 Mon Oct 12 22:27:15.016620407 2020: pid 25817: thd-0x26d2000: amtape: mtx: Storag
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
hmm. Fixed double $$. Then amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily Amanda Tape Server Host Check - NOTE: Holding disk '/amanda3': 3011121152 KB disk space available, using 2906263552 KB NOTE: Holding disk '/amanda4': 416890880 KB disk space available, using 312033280 KB NOTE: Holding disk '/amanda5': 104857600 KB disk space available, using 0 KB ERROR: amcheck-device terminated with signal 11 Server check took 7.302 seconds (brought to you by Amanda 3.5.1) amanda@marlin:~/daily$ In /tmp/amanda/server/daily/amcheck-device.20201012221620.debug: Mon Oct 12 22:16:21.846884734 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 38:Full :VolumeTag=41L7 Mon Oct 12 22:16:21.846931700 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 39:Full :VolumeTag=42L7 Mon Oct 12 22:16:21.846974541 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 40:Empty Mon Oct 12 22:16:21.847013670 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 41:Empty Mon Oct 12 22:16:21.847052040 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 42:Empty Mon Oct 12 22:16:21.847095434 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 43:Full :VolumeTag=43L7 Mon Oct 12 22:16:21.847139628 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 44:Empty Mon Oct 12 22:16:21.847178893 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 45:Empty Mon Oct 12 22:16:21.847217831 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 46:Empty Mon Oct 12 22:16:21.847261315 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 47:Full :VolumeTag=CLN001CU Mon Oct 12 22:16:21.847313575 2020: pid 23996: thd-0x25c0800: amcheck-device: mtx: Storage Element 48:Full :VolumeTag=CLN002CU Mon Oct 12 22:16:21.847462771 2020: pid 23996: thd-0x25c0800: amcheck-device: NEO200x48: updating state Mon Oct 12 22:16:21.854842315 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 0 label: Bio-Research-013ds: 0 Mon Oct 12 22:16:21.856260588 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 1 label: Bio-Research-003ds: 0 Mon Oct 12 22:16:21.856415474 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 2 label: Bio-Research-004ds: 0 Mon Oct 12 22:16:21.856638105 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 3 label: Bio-Research-005ds: 0 Mon Oct 12 22:16:21.856765629 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 4 label: Bio-Research-006ds: 0 Mon Oct 12 22:16:21.856956218 2020: pid 23996: thd-0x25c0800: amcheck-device: warning: Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. Mon Oct 12 22:16:21.856984955 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 5 label: Bio-Research-007ds: Mon Oct 12 22:16:21.857041299 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 6 label: Bio-Research-008ds: 0 Mon Oct 12 22:16:21.857135202 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 7 label: Bio-Research-009ds: 0 Mon Oct 12 22:16:21.857244733 2020: pid 23996: thd-0x25c0800: amcheck-device: warning: Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. Mon Oct 12 22:16:21.857270890 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 8 label: Bio-Research-010ds: Mon Oct 12 22:16:21.857321707 2020: pid 23996: thd-0x25c0800: amcheck-device: warning: Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. Mon Oct 12 22:16:21.857347044 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 9 label: Bio-Research-011ds: Mon Oct 12 22:16:21.857380544 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 10 label: Bio-Research-012ds: 0 Mon Oct 12 22:16:21.857471810 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 11 label: Bio-Research-028ds: 0 Mon Oct 12 22:16:21.857580226 2020: pid 23996: thd-0x25c0800: amcheck-device: warning: Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. Mon Oct 12 22:16:21.857607369 2020: pid 23996: thd-0x25c0800: amcheck-device: slot: 12 label: ds: 0 In the kern.log: Oct 12 22:16:21 marlin kernel: [2534284.886931] amcheck-device[23996]: segfault at 0 ip 7f5cd7ddf7c6 sp 7ffc211d91f8 error 4 in libc-2.23.so[7f5cd7d54000+1c] On 10/12/20 5:33 PM, Nathan Stratton Treadway wrote: On Mon, Oct 12, 2020 at 15:27:44 -0400, Chris Hoogendy
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Mon, Oct 12, 2020 at 14:53:03 -0400, Chris Hoogendyk wrote: > There should be no difference in the tapes. I did them all by just > doing an `amtape daily slot nn` followed by an `amlabel daily > Bio-Research-nnn`. The first 20 or so were all done in sequence in > one session, and that would include the four you mention. I didn't > even retype the commands. I used the up arrow twice to pull up the > previous command, backspaced the number and typed a new number for > the slot or for the tape label as appropriate. Based on your description of when the problems started, I'm guessing the issue is not in how the tapes were originally labeled but some fluke of how they were used after that. > The new tapes that I put in with the new magazines were labeled in > the same way. Those are now out of the library, and the tapes that > had originally been in the library were returned. That is when the > problem occurred. So, what I'm wondering is if there is any pattern to which tape labels tie to the tapes used in the "new magazine" (and thus now no longer actually in the library) and which were in the "removed and later returned" category? Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Mon, Oct 12, 2020 at 15:27:44 -0400, Chris Hoogendyk wrote:
>amcheck-device: Not a SCALAR reference at
>/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.
Ah, drat, this was a typo in my patch. Please edit line 102 of that
file and remove the doubled "$" character, i.e.
$$sl->{device_status}
should be
$sl->{device_status}
Then retry the amcheck test and see if gets any farther along.
Nathan
p.s. if there are any actual Perl programmers left on this list, feel
free to jump in and point us in the right direction here
Nathan Stratton Treadway - [email protected] - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Mon, Oct 12, 2020 at 18:19:21 +, Debra S Baddorf wrote: > Is it worth trying to just remove the ???state??? file (rename it to > .save for instance) > and letting amanda recreate it? Hmmm, that's an intersting idea... though it seems (from reading the man pages) that it might be possilbe to do this using the "amtape reset" command rather than deleting the state file directly, Chris, in the mean time, what to you get from the commands: amtape daily inventory amtape daily taper (For now we'll just let it default to oldest.pm, until we fix the patch for lexical.pm.) Both the output to the terminal session an the log files will probably be useful. Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
Patch applied. Then . . . amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily Amanda Tape Server Host Check - WARNING: holding disk '/amanda3': only 104849408 KB free, using nothing WARNING: Not enough free space specified in amanda.conf NOTE: Holding disk '/amanda4': 137469952 KB disk space available, using 32612352 KB NOTE: Holding disk '/amanda5': 104857600 KB disk space available, using 0 KB amcheck-device: Not a SCALAR reference at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. Server check took 0.965 seconds (brought to you by Amanda 3.5.1) amanda@marlin:~/daily$ ls -lt /tmp/amanda/server/daily/ | head So, it didn't work (didn't load a tape and check the label), but it didn't crash and there were no entries in the kern.log. Following is the end of /tmp/amanda/server/daily/amcheck-device.20201012151659.debug Mon Oct 12 15:17:00.503496521 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 39:Full :VolumeTag=42L7 Mon Oct 12 15:17:00.503538735 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 40:Empty Mon Oct 12 15:17:00.503577116 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 41:Empty Mon Oct 12 15:17:00.503615219 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 42:Empty Mon Oct 12 15:17:00.503657949 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 43:Full :VolumeTag=43L7 Mon Oct 12 15:17:00.503701295 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 44:Empty Mon Oct 12 15:17:00.503739869 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 45:Empty Mon Oct 12 15:17:00.503777939 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 46:Empty Mon Oct 12 15:17:00.503820736 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 47:Full :VolumeTag=CLN001CU Mon Oct 12 15:17:00.503867625 2020: pid 15320: thd-0x1df0800: amcheck-device: mtx: Storage Element 48:Full :VolumeTag=CLN002CU Mon Oct 12 15:17:00.504012672 2020: pid 15320: thd-0x1df0800: amcheck-device: NEO200x48: updating state Mon Oct 12 15:17:00.508566057 2020: pid 15320: thd-0x1df0800: amcheck-device: critical (fatal): Not a SCALAR reference at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102. /usr/local/lib/amanda/libamanda-3.5.1.so(+0x398c7)[0x7fa8696158c7] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_logv+0x1f4)[0x7fa869314a44] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_log+0x8f)[0x7fa869314c6f] /usr/local/share/perl/5.22.1/auto/Amanda/MainLoop/libMainLoop.so(+0x45b7)[0x7fa8659085b7] /lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x4aab3)[0x7fa86930eab3] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_main_context_dispatch+0x15a)[0x7fa86930e04a] /lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x4a3f0)[0x7fa86930e3f0] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_main_context_iteration+0x2c)[0x7fa86930e49c] /usr/local/lib/amanda/libamanda-3.5.1.so(+0x3b291)[0x7fa869617291] /usr/local/share/perl/5.22.1/auto/Amanda/MainLoop/libMainLoop.so(_wrap_run_c+0x53)[0x7fa865908d93] /usr/bin/perl(Perl_pp_entersub+0x48a)[0x4bd5fa] /usr/bin/perl(Perl_runops_standard+0x16)[0x4b64a6] /usr/bin/perl(perl_run+0x2f9)[0x443bc9] /usr/bin/perl(main+0x12b)[0x41cb2b] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fa869efc840] /usr/bin/perl(_start+0x29)[0x41cb69] On 10/11/20 7:04 PM, Nathan Stratton Treadway wrote: On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote: In any case, I currently have both systems flushing to tape using the traditional taper scan. That may work for now, but it would be good to track this down. It's also puzzling that it just turned up R.e. tracking this down: my thought is to see if we can track down the problem by tweaking the code for lexical.pm -- that way, you can trigger test runs using "amcheck -otaperscan=taper_lexical" without running the risk of having a normal cron job attempt to run the code you are in the middle of modifying. I've attached a patch which, hopefully, will both fix the "uninitialized" warning messages that have been appearing in the logs and also print some debugging info as it loops through the tape inventory so we can see if it dies in the middle of that loop. So, when you are ready to investigate further, save a copy of the original /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm, then apply the patch to the "live" lexical.pm file, and then try running amcheck -s (with lexical scan) again. You may have to fix unblanced quotes or whatever typos in the patched lines, but hopefully you'll soon get a log file which lists the tapes in the inventory (but doesn't have the warning lines a
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
There should be no difference in the tapes. I did them all by just doing an `amtape daily slot nn` followed by an `amlabel daily Bio-Research-nnn`. The first 20 or so were all done in sequence in one session, and that would include the four you mention. I didn't even retype the commands. I used the up arrow twice to pull up the previous command, backspaced the number and typed a new number for the slot or for the tape label as appropriate. The new tapes that I put in with the new magazines were labeled in the same way. Those are now out of the library, and the tapes that had originally been in the library were returned. That is when the problem occurred. We're planning on doing the magazine switch once every three weeks and swapping out to Iron Mountain. That's a strategy based on what is convenient given what a magazine will hold, the fact that I am retired and mostly working remotely part time, and there is a pandemic such that I want to avoid going on campus and encountering students. If things get back to normal, they may get around to hiring a replacement for me and then that person may come up with a different strategy. On 10/12/20 12:03 AM, Nathan Stratton Treadway wrote: On Sun, Oct 11, 2020 at 23:32:08 -0400, Chris Hoogendyk wrote: Text file. attached. Excellent, perfect. If you are able to run the patched version of lexical.pm that should give more explicit info, but meanwhile just looking through the statefile: one thing that jumps out at me is that four of the slot entries have Math::BigInt device_status fields, rather than simple integers: Bio-Research-004, Bio-Research-001, Bio-Research-013, Bio-Research-014 Do those four volumes ring a bell with you as being special in some way? (I wonder if the segfault might be related to the program trying to do some operation against a BigInt object when an integer is expected, or something) More generally, is there any pattern to the labels you used for your "normal" tapes v.s. the short-term ones you wrote and then sent to Iron Mountain? Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 -- --- Chris Hoogendyk - O__ Systems Administrator, Retired c/ /'_ --- Biology & Geosciences Departments (*) \(*) -- 315 Morrill Science Center III ~~ - University of Massachusetts, Amherst --- Erdös 4
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
Is it worth trying to just remove the “state” file (rename it to
.save for instance)
and letting amanda recreate it?
When my label reader glitches and reads the label differently than it read it
last time I used this tape,
I just remove my state file and it proceeds to recreate the whole file. If
the tapelist file includes the different data,
I edit that. Not the tape name, but the reading of the tape label.
However, this wouldn’t find the problem in the code.
Deb Baddorf
my config, fwiw:
amanda 3.3.8
Linux
HP Storever MSL4048 tape library
taperscan "taper_lexical"
define changer LTO5-Robot {
tpchanger "chg-robot:/dev/changer"
changerfile "/usr/local/etc/amanda/LTO5-Robot-state”
## <<— this file is deletable
property "tape-device" "0=/dev/nst0" # 1=/dev/nst1 is for archive
config
property "eject-before-unload" "no"
property "use-slots" "1-44"# 45-48 are for archive config
property "load-poll" "0 s poll 3 s until 120 s"
property "fast-search" "true" # has barcode reader
device-property "BLOCK_SIZE" "512k" # from man amanda-changers.7
}
> On Oct 11, 2020, at 11:03 PM, Nathan Stratton Treadway
> wrote:
>
> On Sun, Oct 11, 2020 at 23:32:08 -0400, Chris Hoogendyk wrote:
>> Text file.
>>
>> attached.
>
> Excellent, perfect.
>
> If you are able to run the patched version of lexical.pm that should
> give more explicit info, but meanwhile just looking through the
> statefile: one thing that jumps out at me is that four of the slot
> entries have Math::BigInt device_status fields, rather than simple
> integers:
>
> Bio-Research-004, Bio-Research-001, Bio-Research-013, Bio-Research-014
>
> Do those four volumes ring a bell with you as being special in some way?
>
> (I wonder if the segfault might be related to the program trying to do
> some operation against a BigInt object when an integer is expected, or
> something)
>
> More generally, is there any pattern to the labels you used for your
> "normal" tapes v.s. the short-term ones you wrote and then sent to Iron
> Mountain?
>
> Nathan
>
>
>
> Nathan Stratton Treadway - [email protected] - Mid-Atlantic region
> Ray Ontko & Co. - Software consulting services -
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ontko.com_&d=DwIBAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=HMrKaRiCv4jddln9fLPIOw&m=P_r2VB2yCnPdMxqVrVUzafCnl-Z_6EEAfi_C3oKO2RA&s=q_wkXgEzNjOiVFPx9EV6tXGkcE_dc-NjOkEiUFHh7DE&e=
>
> GPG Key:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ontko.com_-7Enathanst_gpg-5Fkey.txt&d=DwIBAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=HMrKaRiCv4jddln9fLPIOw&m=P_r2VB2yCnPdMxqVrVUzafCnl-Z_6EEAfi_C3oKO2RA&s=y02sMuUPvmfmvo7vUKar5sB1x9yF6JRD7Obyu_G52a0&e=
> ID: 1023D/ECFB6239
> Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 23:32:08 -0400, Chris Hoogendyk wrote: > Text file. > > attached. Excellent, perfect. If you are able to run the patched version of lexical.pm that should give more explicit info, but meanwhile just looking through the statefile: one thing that jumps out at me is that four of the slot entries have Math::BigInt device_status fields, rather than simple integers: Bio-Research-004, Bio-Research-001, Bio-Research-013, Bio-Research-014 Do those four volumes ring a bell with you as being special in some way? (I wonder if the segfault might be related to the program trying to do some operation against a BigInt object when an integer is expected, or something) More generally, is there any pattern to the labels you used for your "normal" tapes v.s. the short-term ones you wrote and then sent to Iron Mountain? Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
Thank you! I'll have to get to that tomorrow. On 10/11/20 7:04 PM, Nathan Stratton Treadway wrote: On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote: In any case, I currently have both systems flushing to tape using the traditional taper scan. That may work for now, but it would be good to track this down. It's also puzzling that it just turned up R.e. tracking this down: my thought is to see if we can track down the problem by tweaking the code for lexical.pm -- that way, you can trigger test runs using "amcheck -otaperscan=taper_lexical" without running the risk of having a normal cron job attempt to run the code you are in the middle of modifying. I've attached a patch which, hopefully, will both fix the "uninitialized" warning messages that have been appearing in the logs and also print some debugging info as it loops through the tape inventory so we can see if it dies in the middle of that loop. So, when you are ready to investigate further, save a copy of the original /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm, then apply the patch to the "live" lexical.pm file, and then try running amcheck -s (with lexical scan) again. You may have to fix unblanced quotes or whatever typos in the patched lines, but hopefully you'll soon get a log file which lists the tapes in the inventory (but doesn't have the warning lines any more)... at which point, send me those log lines Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 -- --- Chris Hoogendyk - O__ Systems Administrator, Retired c/ /'_ --- Biology & Geosciences Departments (*) \(*) -- 315 Morrill Science Center III ~~ - University of Massachusetts, Amherst --- Erdös 4
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
This segment?
define changer NEO200x48 {
tpchanger
"chg-robot:/dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0" # your
changer
device file (T48)
property "tape-device" "0=tape:/dev/nst1" # your tape drive device
file
property append "tape-device" "1=tape:/dev/nst2" # your tape drive
device file
property "use-slots" "14-36" # UPDATE WHEN TAPES ADDED
<
}
amrecover-changer "NEO200x48" # amrecover will use the changer if you
restore
# from this device. It could be a string like 'changer' and amrecover
will use your
# changer if you set your tape to 'changer' with 'setdevice changer' or
via
# 'tapedev "changer"' in amanda-client.conf
define policy research {
retention-tapes 30
}
define storage research {
tpchanger "NEO200x48"
tapetype "LTO7"
policy "research"
eject-volume yes
labelstr "^Bio-Research-[0-9][0-9][0-9]*$"
tapepool "research"
dump-selection "research" ALL
}
On 10/11/20 3:51 PM, Nathan Stratton Treadway wrote:
On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote:
I'm not sure what I should be looking for.
I don't see anything in amanda's home directory that seems likely,
nor in /tmp/amanda, and there is no /etc/amanda/. The tape library
has a web interface that I use to interact with it. Amanda is
configured to use mtx. I can also use mtx by hand to check on the
library's status, and I have a script amchanger that I wrote that
does that for me.
So, aside from amanda keeping an inventory, and the tapelist that it
has, I'm not sure where else anything would be.
My theory is that the driver for the changer keeps the inventory written
in a file somewhere specific to the changer, but since I don't have a
tape changer myself I am not familiar with that driver and don't know
where to direct you to look, off hand
Can you post the changer and tape-drive related parameters/sections from
your amanda.conf file? (I assume the config sections on your two
different servers are essentially the same, right?)
Nsthan
Nathan Stratton Treadway - [email protected] - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
--
---
Chris Hoogendyk
-
O__ Systems Administrator, Retired
c/ /'_ --- Biology & Geosciences Departments
(*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst
---
Erdös 4
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote: > In any case, I currently have both systems flushing to tape using > the traditional taper scan. That may work for now, but it would be > good to track this down. It's also puzzling that it just turned up R.e. tracking this down: my thought is to see if we can track down the problem by tweaking the code for lexical.pm -- that way, you can trigger test runs using "amcheck -otaperscan=taper_lexical" without running the risk of having a normal cron job attempt to run the code you are in the middle of modifying. I've attached a patch which, hopefully, will both fix the "uninitialized" warning messages that have been appearing in the logs and also print some debugging info as it loops through the tape inventory so we can see if it dies in the middle of that loop. So, when you are ready to investigate further, save a copy of the original /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm, then apply the patch to the "live" lexical.pm file, and then try running amcheck -s (with lexical scan) again. You may have to fix unblanced quotes or whatever typos in the patched lines, but hopefully you'll soon get a log file which lists the tapes in the inventory (but doesn't have the warning lines any more)... at which point, send me those log lines Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 --- lexical.pm_orig_v3.5.1 2020-10-11 18:23:37.650643033 -0400 +++ lexical.pm 2020-10-11 18:47:47.097749555 -0400 @@ -97,6 +97,10 @@ for my $i (0..(scalar(@$inventory)-1)) { my $sl = $inventory->[$i]; + + # tracing segfault: +debug("slot: $i label: " . $sl->{'label'} . "ds: " . $$sl->{device_status}); + next if $seen->{$sl->{slot}}; if (!defined $sl->{'state'} || @@ -104,6 +108,7 @@ push @unknown, $sl } elsif ($sl->{'state'} == Amanda::Changer::SLOT_EMPTY) { } elsif (defined $sl->{'label'} && +defined $sl->{device_status} && $sl->{device_status} == $DEVICE_STATUS_SUCCESS) { if ($self->is_reusable_volume(label => $sl->{'label'})) { if ($last_label && $sl->{'label'} gt $last_label) {
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 15:51:34 -0400, Nathan Stratton Treadway wrote: > My theory is that the driver for the changer keeps the inventory written > in a file somewhere specific to the changer, but since I don't have a Okay, looking back through the log file you posted a couple days ago, it looks like the file in question is found at /usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0 If I'm reading the driver correctly, that should be a text file (perl-formated data definitions)... so can you post the contents here? (Or, if it seems too large, send it to me off-list.) (It's atuaally called "statefile", but the robot changer seems to include the inventory information as part of the changer state. ) Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote: > I'm not sure what I should be looking for. > > I don't see anything in amanda's home directory that seems likely, > nor in /tmp/amanda, and there is no /etc/amanda/. The tape library > has a web interface that I use to interact with it. Amanda is > configured to use mtx. I can also use mtx by hand to check on the > library's status, and I have a script amchanger that I wrote that > does that for me. > > So, aside from amanda keeping an inventory, and the tapelist that it > has, I'm not sure where else anything would be. My theory is that the driver for the changer keeps the inventory written in a file somewhere specific to the changer, but since I don't have a tape changer myself I am not familiar with that driver and don't know where to direct you to look, off hand Can you post the changer and tape-drive related parameters/sections from your amanda.conf file? (I assume the config sections on your two different servers are essentially the same, right?) Nsthan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
I'm not sure what I should be looking for. I don't see anything in amanda's home directory that seems likely, nor in /tmp/amanda, and there is no /etc/amanda/. The tape library has a web interface that I use to interact with it. Amanda is configured to use mtx. I can also use mtx by hand to check on the library's status, and I have a script amchanger that I wrote that does that for me. So, aside from amanda keeping an inventory, and the tapelist that it has, I'm not sure where else anything would be. In any case, I currently have both systems flushing to tape using the traditional taper scan. That may work for now, but it would be good to track this down. It's also puzzling that it just turned up as a problem AND on both Amanda servers. I had been using the same setup on both servers for quite some time. A few weeks ago I swapped in some new magazines and labeled all the new tapes with amlabel. Those ran through getting backups. Then this Friday night I swapped the new ones out and put the original magazines back in. That's when the trouble started. On 10/11/20 2:17 PM, Nathan Stratton Treadway wrote: On Sun, Oct 11, 2020 at 12:24:45 -0400, Chris Hoogendyk wrote: /tmp/amanda/server/daily/amcheck-device.20201011120931.debug (lexical) Sun Oct 11 12:09:31.912816794 2020: pid 22002: thd-0xe8e600: amcheck-device: NEO200x48: updating state Sun Oct 11 12:09:31.922463203 2020: pid 22002: thd-0xe8e600: amcheck-device: warning: Use of uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 106. Excellent, this points towards the "tape inventory" part of the code. Can you take a look around your system to see if you can find where the tape changer stores inventory information, internally? (If you don't find it immediately I can look in the source to try to figure out the path it would use, but hopefully its easy enough to figure outjust looking thorugh amanda-related directories on your system.) If we can find that file, it may be able to see some "wierd" data that could be causing a crash by looking at the file directly. Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 -- --- Chris Hoogendyk - O__ Systems Administrator, Retired c/ /'_ --- Biology & Geosciences Departments (*) \(*) -- 315 Morrill Science Center III ~~ - University of Massachusetts, Amherst --- Erdös 4
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sun, Oct 11, 2020 at 12:24:45 -0400, Chris Hoogendyk wrote: > /tmp/amanda/server/daily/amcheck-device.20201011120931.debug (lexical) > Sun Oct 11 12:09:31.912816794 2020: pid 22002: thd-0xe8e600: amcheck-device: > NEO200x48: updating state > Sun Oct 11 12:09:31.922463203 2020: pid 22002: thd-0xe8e600: > amcheck-device: warning: Use of uninitialized value in numeric eq > (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm > line 106. Excellent, this points towards the "tape inventory" part of the code. Can you take a look around your system to see if you can find where the tape changer stores inventory information, internally? (If you don't find it immediately I can look in the source to try to figure out the path it would use, but hopefully its easy enough to figure outjust looking thorugh amanda-related directories on your system.) If we can find that file, it may be able to see some "wierd" data that could be causing a crash by looking at the file directly. Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
/tmp/amanda/server/daily/amcheck-device.20201011120931.debug (lexical) Sun Oct 11 12:09:31.912123219 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 35:Full :VolumeTag=38L7 Sun Oct 11 12:09:31.912170629 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 36:Full :VolumeTag=39L7 Sun Oct 11 12:09:31.912217715 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 37:Full :VolumeTag=40L7 Sun Oct 11 12:09:31.912264266 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 38:Full :VolumeTag=41L7 Sun Oct 11 12:09:31.912313021 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 39:Full :VolumeTag=42L7 Sun Oct 11 12:09:31.912354905 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 40:Empty Sun Oct 11 12:09:31.912392783 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 41:Empty Sun Oct 11 12:09:31.912430299 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 42:Empty Sun Oct 11 12:09:31.912472391 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 43:Full :VolumeTag=43L7 Sun Oct 11 12:09:31.912515070 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 44:Empty Sun Oct 11 12:09:31.912552958 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 45:Empty Sun Oct 11 12:09:31.912590453 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 46:Empty Sun Oct 11 12:09:31.912632567 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 47:Full :VolumeTag=CLN001CU Sun Oct 11 12:09:31.912678933 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx: Storage Element 48:Full :VolumeTag=CLN002CU Sun Oct 11 12:09:31.912816794 2020: pid 22002: thd-0xe8e600: amcheck-device: NEO200x48: updating state Sun Oct 11 12:09:31.922463203 2020: pid 22002: thd-0xe8e600: amcheck-device: warning: Use of uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 106. Sun Oct 11 12:09:31.922577733 2020: pid 22002: thd-0xe8e600: amcheck-device: warning: Use of uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 106. Sun Oct 11 12:09:31.922630514 2020: pid 22002: thd-0xe8e600: amcheck-device: warning: Use of uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 106. /tmp/amanda/server/daily/amcheck-device.20201011121015.debug (traditional) Sun Oct 11 12:10:16.358496965 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 33:Full :VolumeTag=36L7 Sun Oct 11 12:10:16.358551835 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 34:Full :VolumeTag=37L7 Sun Oct 11 12:10:16.358600321 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 35:Full :VolumeTag=38L7 Sun Oct 11 12:10:16.358647199 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 36:Full :VolumeTag=39L7 Sun Oct 11 12:10:16.358693381 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 37:Full :VolumeTag=40L7 Sun Oct 11 12:10:16.358739526 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 38:Full :VolumeTag=41L7 Sun Oct 11 12:10:16.358785636 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 39:Full :VolumeTag=42L7 Sun Oct 11 12:10:16.358831765 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 40:Empty Sun Oct 11 12:10:16.358873441 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 41:Empty Sun Oct 11 12:10:16.358911735 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 42:Empty Sun Oct 11 12:10:16.358954276 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 43:Full :VolumeTag=43L7 Sun Oct 11 12:10:16.358999694 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 44:Empty Sun Oct 11 12:10:16.359037603 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 45:Empty Sun Oct 11 12:10:16.359074776 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 46:Empty Sun Oct 11 12:10:16.359116468 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 47:Full :VolumeTag=CLN001CU Sun Oct 11 12:10:16.359165851 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx: Storage Element 48:Full :VolumeTag=CLN002CU Sun Oct 11 12:10:16.35930 2020: pid 22053: thd-0x8d3600: amcheck-device: NEO200x48: updating state Sun Oct 11 12:10:16.361076728 2020: pid 22053: thd-0x8d3600: amcheck-device: NEO200x48: loading label 'Bio-Research-002' Sun Oct 11 12:10:16.361442086 2020: pid 22053: thd-0x8d3600: amcheck-device: new Amanda::Changer::Error: type='failed', reason='notfound'
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
This morning: amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily Amanda Tape Server Host Check - NOTE: Holding disk '/amanda3': 253538304 KB disk space available, using 148680704 KB WARNING: holding disk '/amanda4': only 104853504 KB free, using nothing WARNING: Not enough free space specified in amanda.conf NOTE: Holding disk '/amanda5': 613318656 KB disk space available, using 508461056 KB ERROR: amcheck-device terminated with signal 11 Server check took 1.505 seconds (brought to you by Amanda 3.5.1) amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily Amanda Tape Server Host Check - NOTE: Holding disk '/amanda3': 253538304 KB disk space available, using 148680704 KB WARNING: holding disk '/amanda4': only 104853504 KB free, using nothing WARNING: Not enough free space specified in amanda.conf NOTE: Holding disk '/amanda5': 613318656 KB disk space available, using 508461056 KB Searching for label 'Bio-Research-002':label 'Bio-Research-002' not recognized or not found slot 15: volume 'Bio-Research-003' Will write to volume 'Bio-Research-003' in slot 15. NOTE: skipping tape-writable test Server check took 19.306 seconds (brought to you by Amanda 3.5.1) amanda@marlin:~/daily$ I'll look at logs and post those if I see anything. On 10/11/20 12:58 AM, Nathan Stratton Treadway wrote: On Sat, Oct 10, 2020 at 23:50:17 -0400, Chris Hoogendyk wrote: Wow! amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily Amanda Tape Server Host Check - NOTE: Holding disk '/amanda3': 913514496 KB disk space available, using 808656896 KB NOTE: Holding disk '/amanda4': 158228480 KB disk space available, using 53370880 KB NOTE: Holding disk '/amanda5': 1636618240 KB disk space available, using 1531760640 KB Searching for label 'Bio-Research-002':label 'Bio-Research-002' not recognized or not found slot 13:slot 13 not in use-slots (14-36) slot 14: volume 'Bio-Research-013' is still active and cannot be overwritten slot 15: volume 'Bio-Research-003' Will write to volume 'Bio-Research-003' in slot 15. NOTE: skipping tape-writable test Server check took 175.512 seconds (brought to you by Amanda 3.5.1) amanda@marlin:~/daily$ That worked! Interestingly, doing an `amcheck -s daily` after that fails just as before. The amanda.conf uses taper_oldest. Okay, this lends support to the theory that the crash is actually happening in "scan" operation, rather than in some later part of the amcheck-driver/taper process. (Were there any error/warning messages written to the amcheck-device log file for that run?) So, maybe if I temporarily go to the different algorithm, it will work. Right now the backups are already running and dumping to Yeah, it might well work, and if so -- and if you don't care which tape(s) are used next -- then simply switching to taper_traditional would probably be the easiest approach to getting new dumps actually written to tape If the order the tapes are used does matter to you, I think it should probably be possible to fix the bug in taper_oldest (oldest.pm) to get it working (but I'm not really sure how much debugging effort it will involve...). If you are interested to attempt that, the next thing I would check is to see what happens with -otaperscan=taper_lexical (assuming that is also defined in your amanda.conf). The "lexical" and "oldest" algorithms both use the tape-drive inventory (while "traditional" does not), so that test will help narrow the problem down to just "oldest" or to the tape-inventory part of the code Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 -- --- Chris Hoogendyk - O__ Systems Administrator, Retired c/ /'_ --- Biology & Geosciences Departments (*) \(*) -- 315 Morrill Science Center III ~~ - University of Massachusetts, Amherst --- Erdös 4
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sat, Oct 10, 2020 at 23:50:17 -0400, Chris Hoogendyk wrote: > Wow! > >amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily >Amanda Tape Server Host Check >- >NOTE: Holding disk '/amanda3': 913514496 KB disk space available, using > 808656896 KB >NOTE: Holding disk '/amanda4': 158228480 KB disk space available, using > 53370880 KB >NOTE: Holding disk '/amanda5': 1636618240 KB disk space available, using > 1531760640 KB >Searching for label 'Bio-Research-002':label 'Bio-Research-002' not > recognized or not found >slot 13:slot 13 not in use-slots (14-36) >slot 14: volume 'Bio-Research-013' is still active and cannot be > overwritten >slot 15: volume 'Bio-Research-003' >Will write to volume 'Bio-Research-003' in slot 15. >NOTE: skipping tape-writable test >Server check took 175.512 seconds >(brought to you by Amanda 3.5.1) >amanda@marlin:~/daily$ > > That worked! Interestingly, doing an `amcheck -s daily` after that > fails just as before. The amanda.conf uses taper_oldest. Okay, this lends support to the theory that the crash is actually happening in "scan" operation, rather than in some later part of the amcheck-driver/taper process. (Were there any error/warning messages written to the amcheck-device log file for that run?) > So, maybe if I temporarily go to the different algorithm, it will > work. Right now the backups are already running and dumping to Yeah, it might well work, and if so -- and if you don't care which tape(s) are used next -- then simply switching to taper_traditional would probably be the easiest approach to getting new dumps actually written to tape If the order the tapes are used does matter to you, I think it should probably be possible to fix the bug in taper_oldest (oldest.pm) to get it working (but I'm not really sure how much debugging effort it will involve...). If you are interested to attempt that, the next thing I would check is to see what happens with -otaperscan=taper_lexical (assuming that is also defined in your amanda.conf). The "lexical" and "oldest" algorithms both use the tape-drive inventory (while "traditional" does not), so that test will help narrow the problem down to just "oldest" or to the tape-inventory part of the code Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
Wow! amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily Amanda Tape Server Host Check - NOTE: Holding disk '/amanda3': 913514496 KB disk space available, using 808656896 KB NOTE: Holding disk '/amanda4': 158228480 KB disk space available, using 53370880 KB NOTE: Holding disk '/amanda5': 1636618240 KB disk space available, using 1531760640 KB Searching for label 'Bio-Research-002':label 'Bio-Research-002' not recognized or not found slot 13:slot 13 not in use-slots (14-36) slot 14: volume 'Bio-Research-013' is still active and cannot be overwritten slot 15: volume 'Bio-Research-003' Will write to volume 'Bio-Research-003' in slot 15. NOTE: skipping tape-writable test Server check took 175.512 seconds (brought to you by Amanda 3.5.1) amanda@marlin:~/daily$ That worked! Interestingly, doing an `amcheck -s daily` after that fails just as before. The amanda.conf uses taper_oldest. So, maybe if I temporarily go to the different algorithm, it will work. Right now the backups are already running and dumping to holding disk. That will likely terminate when out of space, and in the morning I can initiate a new run with the amanda.conf changed. On 10/10/20 8:09 PM, Nathan Stratton Treadway wrote: On Sat, Oct 10, 2020 at 19:20:25 -0400, Chris Hoogendyk wrote: The taper log from last night looks exactly like the amcheck log, showing the same inventory followed by the updating state and the same three repetitions of the warning at line 102. So you are saying that the taper log from last night goes up through the lines that look like Thu Oct 08 23:30:04.244060162 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 48:Full :VolumeTag=CLN002CU Thu Oct 08 23:30:04.244209614 2020: pid 18920: thd-0x28b4800: taper: NEO200x48: updating state , then switch to the warning lines from oldest.pm line 102, and then aborts suddenly? It's not getting you an actual solution, but I'm curious if changing to a different taperscan algorithm in the amanda.conf file (or using -otaperscan= , if you have any other ones defined already) allow "amcheck -s" to complete sucessfully (and in any case what the amcheck log file looks like with the another taperscan algorithm) Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 -- --- Chris Hoogendyk - O__ Systems Administrator, Retired c/ /'_ --- Biology & Geosciences Departments (*) \(*) -- 315 Morrill Science Center III ~~ - University of Massachusetts, Amherst --- Erdös 4
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sat, Oct 10, 2020 at 19:20:25 -0400, Chris Hoogendyk wrote: > The taper log from last night looks exactly like the amcheck log, > showing the same inventory followed by the updating state and the > same three repetitions of the warning at line 102. So you are saying that the taper log from last night goes up through the lines that look like >Thu Oct 08 23:30:04.244060162 2020: pid 18920: thd-0x28b4800: taper: mtx: > Storage Element 48:Full :VolumeTag=CLN002CU >Thu Oct 08 23:30:04.244209614 2020: pid 18920: thd-0x28b4800: taper: > NEO200x48: updating state , then switch to the warning lines from oldest.pm line 102, and then aborts suddenly? It's not getting you an actual solution, but I'm curious if changing to a different taperscan algorithm in the amanda.conf file (or using -otaperscan= , if you have any other ones defined already) allow "amcheck -s" to complete sucessfully (and in any case what the amcheck log file looks like with the another taperscan algorithm) Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
The taper log from last night looks exactly like the amcheck log, showing the same inventory followed by the updating state and the same three repetitions of the warning at line 102. The taper log from the night before, when it was successful, ends with this: Thu Oct 08 23:30:04.243540450 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 36:Full :VolumeTag=55L7 Thu Oct 08 23:30:04.243588794 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 37:Full :VolumeTag=40L7 Thu Oct 08 23:30:04.243636849 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 38:Full :VolumeTag=41L7 Thu Oct 08 23:30:04.243684260 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 39:Full :VolumeTag=42L7 Thu Oct 08 23:30:04.243727126 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 40:Empty Thu Oct 08 23:30:04.243765952 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 41:Empty Thu Oct 08 23:30:04.243804416 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 42:Empty Thu Oct 08 23:30:04.243847604 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 43:Full :VolumeTag=43L7 Thu Oct 08 23:30:04.243891382 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 44:Empty Thu Oct 08 23:30:04.243931546 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 45:Empty Thu Oct 08 23:30:04.243969440 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 46:Empty Thu Oct 08 23:30:04.244012542 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 47:Full :VolumeTag=CLN001CU Thu Oct 08 23:30:04.244060162 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 48:Full :VolumeTag=CLN002CU Thu Oct 08 23:30:04.244209614 2020: pid 18920: thd-0x28b4800: taper: NEO200x48: updating state Thu Oct 08 23:30:04.253017380 2020: pid 18920: thd-0x28b4800: taper: NEO200x48: too early for another 'status' invocation Thu Oct 08 23:30:04.253371305 2020: pid 18920: thd-0x28b4800: taper: NEO200x48: loading slot '4' Thu Oct 08 23:30:04.253422930 2020: pid 18920: thd-0x28b4800: taper: NEO200x48: requested volume is already in drive 0 Thu Oct 08 23:30:04.253728335 2020: pid 18920: thd-0x28b4800: taper: NEO200x48: polling 'tape:/dev/nst1' to see if it's ready Thu Oct 08 23:30:04.256414884 2020: pid 18920: thd-0x28b4800: taper: Device is in variable block size Thu Oct 08 23:30:23.300747859 2020: pid 18920: thd-0x28b4800: taper: Slot 4 with label Bio-Research-058 is usable Thu Oct 08 23:30:23.304473715 2020: pid 18920: thd-0x28b4800: taper: Amanda::Taper::Scribe preparing to write, part size 0, using no cache (PEOM will be fatal) (splitter) (no LEOM) Thu Oct 08 23:30:23.304769712 2020: pid 18920: thd-0x28b4800: taper: Starting -> )> Thu Oct 08 23:30:23.304819253 2020: pid 18920: thd-0x28b4800: taper: Final linkage: -(MEM_RING)-> Thu Oct 08 23:30:23.305432752 2020: pid 18920: thd-0x28b4800: taper: header native_crc: 9458cc2f:496412180480 Thu Oct 08 23:30:23.305451271 2020: pid 18920: thd-0x28b4800: taper: header client_crc: 9458cc2f:496412180480 Thu Oct 08 23:30:23.305465429 2020: pid 18920: thd-0x28b4800: taper: header server_crc: 9458cc2f:496412180480 Thu Oct 08 23:30:23.305552797 2020: pid 18920: thd-0x28b4800: taper: start_recovery called Thu Oct 08 23:30:23.325029323 2020: pid 18920: thd-0x28b4800: taper: Building type TAPESTART header of 524288-524288 bytes with name='Bio-Research-058' disk='' dumplevel=0 and blocksize=524288 Thu Oct 08 23:30:31.666780210 2020: pid 18920: thd-0x28b4800: taper: invoking /usr/sbin/mtx -f /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 status Thu Oct 08 23:30:32.316054467 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Changer /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 Slots ( 0 Import/Export ) Thu Oct 08 23:30:32.316179280 2020: pid 18920: thd-0x28b4800: taper: mtx: Data Transfer Element 0:Full (Storage Element 4 Loaded):VolumeTag = 58L7 Thu Oct 08 23:30:32.316269484 2020: pid 18920: thd-0x28b4800: taper: mtx: Data Transfer Element 1:Empty Thu Oct 08 23:30:32.316314827 2020: pid 18920: thd-0x28b4800: taper: mtx: Storage Element 1:Empty I got 6.8TB on tape the night the above log represents. A successful amcheck-device log from a couple of days ago looks just like that. There are no log files with a name that includes "change". On 10/10/20 6:20 PM, Nathan Stratton Treadway wrote: On Sat, Oct 10, 2020 at 17:11:16 -0400, Chris Hoogendyk wrote: Also 8 lines like this in the kern.log: Oct 9 21:05:58 eclogite kernel: [650814.343315] amcheck-device[5089]: segfault at 0 ip 7f94159617c6 sp 7fff61039da8 error 4 in libc-2.23.so[7f94158d6000+1c] Well, that does
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sat, Oct 10, 2020 at 17:11:16 -0400, Chris Hoogendyk wrote: > Also > > 8 lines like this in the kern.log: > >Oct 9 21:05:58 eclogite kernel: [650814.343315] amcheck-device[5089]: > segfault at 0 ip >7f94159617c6 sp 7fff61039da8 error 4 in > libc-2.23.so[7f94158d6000+1c] > Well, that doesn't really tell us more more about what is going wrong, other than the slight hint the problem is all the way down in the the libc library somehow. I don't know how much you are trying to investigate further at this point in your furlough schedule... but I still feel that comparing the log from a succesfull run to this aborted run has the best chance of generating a hint as to exactly what operation is underway at the point of failure. Also, the taper/changer logs from last night's run should give some hint as to what it was attempting, and perhaps those logs will be different enough from the amcheck-device logs that it'll give some new information (The last operation that appears to be happening in your quoted amcheck-device log lines is a scan through the tape-changer inventory. I don't have a physical tape-drive changer myself so I don't have any guesses as to what could be wrong, but based on the history of the situation you described it does seem plausible that the inventory database it's working from could contain some "unexpected" data of some sort...) Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
To follow up. There was no data written to tape. On one system, amstatus indicated that it was trying to locate a tape, but that never succeeded. Both systems backed up to holding disk. I have some space left on holding disk, but not enough to get beyond tonight, and tonight is likely to fall baqck to incrementals only. On 10/9/20 10:22 PM, Chris Hoogendyk wrote: I have two servers doing Amanda backups for two different departments. They are both Supermicro servers running Ubuntu 16.04 with fairly recent patches and Amanda 3.5.1. This evening I came in to swap magazines in the Overland NeoSeries tape libraries. We have just begun a program to send a set of magazines out to Iron Mountain. The libraries had new magazines with new LTO7 tapes that I had labeled with amlabel and then run through enough days of backups to write them all. Now I'm taking those out, putting back in the original magazines, and sending the new ones out to Iron Mountain. So, part of the point of saying all that is that I have done this magazine swap previously. Once. Now, this evening, on both servers, I'm getting this error (ERROR: amcheck-device terminated with signal 11). When I ran the amcheck before swapping magazines, it worked. I then did an `amtape daily slot 2` to get it pointed at the beginning, went to the web interface and unloaded that tape back into the magazine, swapped the magazines, and went back to amanda. `amtape daily slot whatever` works. It returns telling me the label on the tape, and the web interface shows it loaded. `amlabel daily somelabel` works. It looks at the tape, reads the label, tells me that it is already labeled and in use. `amcheck daily` fails. Any clues? I need to resolve this fairly quickly. The University is doing mandatory furloughs to deal with their budget shortfalls, and I'm scheduled for a one week furlough starting Sunday. I will update this later with information on whether the backups that are scheduled to start shortly end up working. I presume they will at least go to holding disk, but whether they will write to the tape is the question. Following is the contents of the /tmp/amanda/server/daily/amcheck-device.20201009215709.debug: Fri Oct 09 21:57:09.046919855 2020: pid 24239: thd-0xc2e600: amcheck-device: pid 24239 ruid 555 euid 555 version 3.5.1: start at Fri Oct 9 21:57:09 2020 Fri Oct 09 21:57:09.047288267 2020: pid 24239: thd-0xc2e600: amcheck-device: reading config file /usr/local/etc/amanda/daily/amanda.conf Fri Oct 09 21:57:09.076905220 2020: pid 24239: thd-0xc2e600: amcheck-device: pid 24239 ruid 555 euid 555 version 3.5.1: rename at Fri Oct 9 21:57:09 2020 Fri Oct 09 21:57:09.105544741 2020: pid 24239: thd-0xc2e600: amcheck-device: chg-robot: using statefile '/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0' Fri Oct 09 21:57:09.124493185 2020: pid 24239: thd-0xc2e600: amcheck-device: invoking /usr/sbin/mtx -f /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 status Fri Oct 09 21:57:09.793516825 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Changer /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 Slots ( 0 Import/Export ) Fri Oct 09 21:57:09.793592258 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Data Transfer Element 0:Empty Fri Oct 09 21:57:09.793645704 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Data Transfer Element 1:Full (Storage Element 13 Loaded):VolumeTag = 01L7 Fri Oct 09 21:57:09.793696913 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 1:Empty Fri Oct 09 21:57:09.793740270 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 2:Empty Fri Oct 09 21:57:09.793779177 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 3:Full :VolumeTag=15L7 Fri Oct 09 21:57:09.793822925 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 4:Full :VolumeTag=16L7 Fri Oct 09 21:57:09.793860922 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 5:Full :VolumeTag=17L7 Fri Oct 09 21:57:09.793898857 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 6:Full :VolumeTag=14L7 Fri Oct 09 21:57:09.793932089 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 7:Empty Fri Oct 09 21:57:09.793961884 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 8:Empty Fri Oct 09 21:57:09.794026494 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 9:Full :VolumeTag=21L7 Fri Oct 09 21:57:09.794067771 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 10:Full :VolumeTag=18L7 Fri Oct 09 21:57:09.794102988 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 11:Empty Fri Oct 09 21:57:09.794132586 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Storage Element 12:Empty Fri Oct 09 21:57:09.7
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
Also 8 lines like this in the kern.log: Oct 9 21:05:58 eclogite kernel: [650814.343315] amcheck-device[5089]: segfault at 0 ip 7f94159617c6 sp 7fff61039da8 error 4 in libc-2.23.so[7f94158d6000+1c] On 10/10/20 12:29 AM, Nathan Stratton Treadway wrote: On Fri, Oct 09, 2020 at 22:22:47 -0400, Chris Hoogendyk wrote: both servers, I'm getting this error (ERROR: amcheck-device terminated with signal 11). When I ran the amcheck before swapping amcheck-device is a Perl program, so it's a little bit impressive to be triggering a SEGV of the process :( . Do you get any coredump-related kernel messages in your syslog file when the process crashes? Fri Oct 09 21:57:09.795469021 2020: pid 24239: thd-0xc2e600: amcheck-device: NEO200x48: updating state Fri Oct 09 21:57:09.802340532 2020: pid 24239: thd-0xc2e600: amcheck-device: warning: Use of uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm line 102. Fri Oct 09 21:57:09.802537511 2020: pid 24239: thd-0xc2e600: amcheck-device: warning: Use of uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm line 102. Fri Oct 09 21:57:09.802622523 2020: pid 24239: thd-0xc2e600: amcheck-device: warning: Use of uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm line 102. I am guessing this "uninitialized value" warning is not directly causing the crash, but those log message might possibly hint as to where in the program execution had reached just prior to the crash. Can you post the output of $ sed -n '99,$p;105q' /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm ? (In other words, what's line 102 in that file on your system, with a few lines of context?) Do you see those warning lines in the log files from a successfully amcheck run (i.e. from a few days ago)? What do those logs show after the warning lines (or the "updating state" line)? Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239 -- --- Chris Hoogendyk - O__ Systems Administrator, Retired c/ /'_ --- Biology & Geosciences Departments (*) \(*) -- 315 Morrill Science Center III ~~ - University of Massachusetts, Amherst --- Erdös 4
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
Sorry for the delay in responding.
amanda@marlin:~/daily$ cat -n
/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm | grep "^
*102" -C3
99 $sl->{'state'} == Amanda::Changer::SLOT_UNKNOWN) {
100 push @unknown, $sl;
101 } elsif ($sl->{'state'} == Amanda::Changer::SLOT_EMPTY) {
102 } elsif (defined $sl->{'label'} &&
103 $sl->{device_status} == $DEVICE_STATUS_SUCCESS) {
104 my $vol_tle =
$self->{'tapelist'}->lookup_tapelabel($sl->{'label'});
105 if ($self->is_reusable_volume(label => $sl->{'label'})) {
amanda@marlin:~/daily$
On 10/10/20 12:46 AM, Nathan Stratton Treadway wrote:
On Sat, Oct 10, 2020 at 00:29:04 -0400, Nathan Stratton Treadway wrote:
Can you post the output of
$ sed -n '99,$p;105q'
/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
?
(In other words, what's line 102 in that file on your system, with a
few lines of context?)
(Hmm, perhaps
$ cat -n /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm | grep "^
*102" -C3
would be better -- that way, the file line numbers are included in
output...)
Nathan
Nathan Stratton Treadway - [email protected] - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
--
---
Chris Hoogendyk
-
O__ Systems Administrator, Retired
c/ /'_ --- Biology & Geosciences Departments
(*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst
---
Erdös 4
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Sat, Oct 10, 2020 at 00:29:04 -0400, Nathan Stratton Treadway wrote: > Can you post the output of > > $ sed -n '99,$p;105q' > /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm > ? > > (In other words, what's line 102 in that file on your system, with a > few lines of context?) (Hmm, perhaps $ cat -n /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm | grep "^ *102" -C3 would be better -- that way, the file line numbers are included in output...) Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"
On Fri, Oct 09, 2020 at 22:22:47 -0400, Chris Hoogendyk wrote: > both servers, I'm getting this error (ERROR: amcheck-device > terminated with signal 11). When I ran the amcheck before swapping amcheck-device is a Perl program, so it's a little bit impressive to be triggering a SEGV of the process :( . Do you get any coredump-related kernel messages in your syslog file when the process crashes? > Fri Oct 09 21:57:09.795469021 2020: pid 24239: thd-0xc2e600: amcheck-device: > NEO200x48: updating state > Fri Oct 09 21:57:09.802340532 2020: pid 24239: thd-0xc2e600: > amcheck-device: warning: Use of uninitialized value in numeric eq > (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm > line 102. > Fri Oct 09 21:57:09.802537511 2020: pid 24239: thd-0xc2e600: > amcheck-device: warning: Use of uninitialized value in numeric eq > (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm > line 102. > Fri Oct 09 21:57:09.802622523 2020: pid 24239: thd-0xc2e600: > amcheck-device: warning: Use of uninitialized value in numeric eq > (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm > line 102. I am guessing this "uninitialized value" warning is not directly causing the crash, but those log message might possibly hint as to where in the program execution had reached just prior to the crash. Can you post the output of $ sed -n '99,$p;105q' /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm ? (In other words, what's line 102 in that file on your system, with a few lines of context?) Do you see those warning lines in the log files from a successfully amcheck run (i.e. from a few days ago)? What do those logs show after the warning lines (or the "updating state" line)? Nathan Nathan Stratton Treadway - [email protected] - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
