Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-15 Thread Nathan Stratton Treadway
On Wed, Oct 14, 2020 at 13:20:38 -0400, Chris Hoogendyk wrote:
> I've been doing all the debugging stuff one one server. On my other
> server, I had simply set it to traditional and it's been working.
> Just now I went and applied ScanInventory.pm.patch_20201013C and
> changed the amanda.conf back to using oldest. An amcheck daily told
> me there were no acceptable volumes found! So I switched back to


Hmmm...  Well, clearly the exact failure mode different between the two
servers :(.

(You did previously see the "terminated with signal 11" error message on
the eclogite server, right?)

Did the dump that ran yesterday/last night actually write to the
geo-daily-065 tape successfully, or was there some sort of changer error
at run time?


I guess the next thing to try would be to install
ScanInventory.pm.patch_20201013B to enable some debugging, then run
"amcheck -s" and "amtape ... taper" again and post the results.  Those
tests should be done using the oldest taperscan, but you can leave the
amanda.conf as-is and test with -otaperscan on the command lines if you
prefer.

Also, since I assume the statefile has changed since you last posted it,
I guess you should include it again (i.e. the version that is out then
at the moemnt you are running those tests).

Nathan 


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-14 Thread Chris Hoogendyk

From the other server:

   amandabackup@eclogite:~/daily$ cat
   
/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64100589-LL0
   $STATE = {
   'drive_lru' => [
    '0',
    '1'
  ],
   'last_status' => 1602694135,
   'last_operation_delay' => 0,
   'last_operation_time' => 1602694089,
   'slots' => {
    '3' => {
 'loaded_in' => undef,
 'label' => 'geo-daily-062',
 'state' => 1,
 'ie' => 0,
 'f_type' => bless( {
   'sign' => '+',
   'value' => [
   1
   ]
    }, 'Math::BigInt' ),
 'device_error' => undef,
 'barcode' => '62L7',
 'device_status' => bless( {
   'value' => [
   0
   ],
   'sign' => '+'
   }, 'Math::BigInt' )
   },
    '4' => {
 'device_status' => bless( {
   'sign' => '+',
   'value' => [
   0
   ]
   }, 'Math::BigInt' ),
 'barcode' => '63L7',
 'device_error' => undef,
 'f_type' => bless( {
   'value' => [
   1
   ],
   'sign' => '+'
    }, 'Math::BigInt' ),
 'ie' => 0,
 'state' => 1,
 'loaded_in' => undef,
 'label' => 'geo-daily-063'
   },
    '17' => {
  'device_status' => bless( {
   'value' => [
   4
   ],
   'sign' => '+'
   }, 'Math::BigInt' ),
  'barcode' => '76L7',
  'device_error' => 'Tape device /dev/nst1 
is not ready or is
   empty',
  'ie' => 0,
  'f_type' => undef,
  'state' => 1,
  'loaded_in' => undef,
  'label' => undef
    },
    '16' => {
  'barcode' => '75L7',
  'device_status' => bless( {
   'sign' => '+',
   'value' => [
   4
   ]
   }, 'Math::BigInt' ),
  'device_error' => 'Tape device /dev/nst1 
is not ready or is
   empty',
  'ie' => 0,
  'f_type' => undef,
  'state' => 1,
  'loaded_in' => undef,
  'label' => undef
    },
    '10' => {
  'ie' => 0,
  'f_type' => undef,
  'state' => 1,
  'loaded_in' => undef,
  'label' => undef,
  'barcode' => '69L7',
  'device_status' => bless( {
   'sign' => '+',
   'value' => [
   4
   ]
   }, 'Math::BigInt' ),
  'device_error' => 'Tape device /dev/nst1 
is not ready or is
   empty'
    },
    '13' => {
  'device_error' => 'Tape device /dev/nst1 
is not ready or is
   empty',
  'device_status' => bless( {
   'value' => [
   4
   ],
   'sign' => '+'
   }, 'Math::BigInt' ),
  'barcode' => '72L7',
  'loaded_in' => undef,
  'label' => undef,
  'f_type' => undef,
  'ie' => 0,
  'state' => 1
    },
    '7' => {
 'state' => 1,
 'ie' => 0,
 'f_type' => undef,
 'loaded_in' => undef,
 'label' => undef,
 'barcode' => '66L7',
 'device_status' => bless( {

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-14 Thread Nathan Stratton Treadway
On Wed, Oct 14, 2020 at 12:20:33 -0400, Chris Hoogendyk wrote:
> Both lexical.pm and ScanInventory.pm restored to original. New fix
> only patch applied to ScanInventory.pm. amanda.conf restored to use
> oldest.
> 
>amanda@marlin:~/daily$ amcheck daily
> 
>Amanda Tape Server Host Check
>-
>NOTE: Holding disk '/amanda3': 139784192 KB disk space available, 
> using 34926592 KB
>NOTE: Holding disk '/amanda4': 170082304 KB disk space available, 
> using 65224704 KB
>NOTE: Holding disk '/amanda5': 240713728 KB disk space available, 
> using 135856128 KB
>  * Authorized Use Only *
> 
>snapper
>slot 25: volume 'Bio-Research-028'
>Will write to volume 'Bio-Research-028' in slot 25.
>NOTE: skipping tape-writable test
>Server check took 50.059 seconds
>Amanda Backup Client Hosts Check
>
>Client check: 4 hosts checked in 6.692 seconds.  0 problems found.
>(brought to you by Amanda 3.5.1)
> 
>amanda@marlin:~/daily$
[...]> 
> 
> Launched a flush on that. Then the following seems to set up a tape on the 
> second tape drive.
> 
>amanda@marlin:~/daily$ amtape daily taper
> 
>slot 31: volume 'Bio-Research-032'
>Will write to volume 'Bio-Research-032' in slot 31.
> 
>amanda@marlin:~/daily$

Okay, sounds like things are back to working "normally" on that server,
right?

So, do you still have a second server which is getting coredumps (at
least with the oldest taperscan)?

Based on the investigation so far, it seems like the crash is caused by
tape-inventory records which have no label text along with some specific
other data field values.

If you post the /usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi*
changer state file that other server, we can double check that such
entries exist over there, too.

(Assume they do, then I guess the question will be whether you want to
apply the same ScanInventory.pm patch there, or if you instead want to
try clearing that/those bad inventory record(s) without changing the
installed code on that box)

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-14 Thread Chris Hoogendyk
Both lexical.pm and ScanInventory.pm restored to original. New fix only patch applied to 
ScanInventory.pm. amanda.conf restored to use oldest.


   amanda@marlin:~/daily$ amcheck daily

   Amanda Tape Server Host Check
   -
   NOTE: Holding disk '/amanda3': 139784192 KB disk space available, using 
34926592 KB
   NOTE: Holding disk '/amanda4': 170082304 KB disk space available, using 
65224704 KB
   NOTE: Holding disk '/amanda5': 240713728 KB disk space available, using 
135856128 KB
 * Authorized Use Only *

   snapper
   slot 25: volume 'Bio-Research-028'
   Will write to volume 'Bio-Research-028' in slot 25.
   NOTE: skipping tape-writable test
   Server check took 50.059 seconds
   Amanda Backup Client Hosts Check
   
   Client check: 4 hosts checked in 6.692 seconds.  0 problems found.
   (brought to you by Amanda 3.5.1)

   amanda@marlin:~/daily$

Contents of /tmp/amanda/server/daily/amcheck-device.20201014115111.debug

   Wed Oct 14 11:51:11.399537832 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: pid 10048 ruid 555
   euid 555 version 3.5.1: start at Wed Oct 14 11:51:11 2020
   Wed Oct 14 11:51:11.399897845 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: reading config
   file /usr/local/etc/amanda/daily/amanda.conf
   Wed Oct 14 11:51:11.429240795 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: pid 10048 ruid 555
   euid 555 version 3.5.1: rename at Wed Oct 14 11:51:11 2020
   Wed Oct 14 11:51:11.457544084 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: chg-robot: using
   statefile 
'/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0'
   Wed Oct 14 11:51:11.496418988 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: invoking
   /usr/sbin/mtx -f /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 
status
   Wed Oct 14 11:51:12.205067256 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Changer /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 
Slots ( 0
   Import/Export )
   Wed Oct 14 11:51:12.205147868 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx: Data Transfer
   Element 0:Empty
   Wed Oct 14 11:51:12.205207448 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx: Data Transfer
   Element 1:Full (Storage Element 18 Loaded):VolumeTag = 06L7
   Wed Oct 14 11:51:12.205264401 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 1:Empty
   Wed Oct 14 11:51:12.205318585 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 2:Empty
   Wed Oct 14 11:51:12.205360601 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 3:Full :VolumeTag=15L7
   Wed Oct 14 11:51:12.205414201 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 4:Full :VolumeTag=16L7
   Wed Oct 14 11:51:12.205460521 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 5:Full :VolumeTag=17L7
   Wed Oct 14 11:51:12.205505018 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 6:Full :VolumeTag=14L7
   Wed Oct 14 11:51:12.205543248 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 7:Empty
   Wed Oct 14 11:51:12.205580955 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 8:Empty
   Wed Oct 14 11:51:12.205620300 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 9:Full :VolumeTag=21L7
   Wed Oct 14 11:51:12.205665098 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 10:Full :VolumeTag=18L7
   Wed Oct 14 11:51:12.205705828 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 11:Empty
   Wed Oct 14 11:51:12.205739917 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 12:Empty
   Wed Oct 14 11:51:12.205777202 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 13:Full :VolumeTag=01L7
   Wed Oct 14 11:51:12.205817941 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 14:Full :VolumeTag=13L7
   Wed Oct 14 11:51:12.205859124 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 15:Full :VolumeTag=03L7
   Wed Oct 14 11:51:12.205899807 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 16:Full :VolumeTag=04L7
   Wed Oct 14 11:51:12.205945040 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 17:Full :VolumeTag=05L7
   Wed Oct 14 11:51:12.205985155 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 18:Empty
   Wed Oct 14 11:51:12.206066315 2020: pid 10048: thd-0x1bb4600: 
amcheck-device: mtx:   Storage
   Element 19:Full :VolumeTag=07L7
   Wed Oct 14 11:51:12.206109137 2020: pid 10048: thd-0x1bb4600: 

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-13 Thread Nathan Stratton Treadway
On Tue, Oct 13, 2020 at 21:02:34 -0400, Chris Hoogendyk wrote:
> Patched.
> 
>amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily
> 
>Amanda Tape Server Host Check
>-
>NOTE: Holding disk '/amanda3': 449998848 KB disk space available, 
> using 345141248 KB
>NOTE: Holding disk '/amanda4': 3026923520 KB disk space available, 
> using 2922065920 KB
>NOTE: Holding disk '/amanda5': 104857600 KB disk space available, 
> using 0 KB
>slot 19: volume 'Bio-Research-007'
>Will write to volume 'Bio-Research-007' in slot 19.
>NOTE: skipping tape-writable test
>Server check took 20.422 seconds
>(brought to you by Amanda 3.5.1)

Okay, great.

As I said before I'm not confident this is a competely correct fix, but
attached here is a patch file (against the original version of the file)
containing just the "fix" line (i.e. no debugging statements).

You should be able to swap your lexical.pm back to the original version
and put ScanInventory.pm back to the original with just this one patch,
and then go ahead and switch back to "oldest" taperscan again in your
amanda.conf -- hopefully it will all "just work" again.

I'm curious to see the log file from an "amtape daily taper" run with
that setup in place (I assume that will run to successful completion,
too...).



Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
--- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400
+++ ScanInventory.pm2020-10-13 22:43:25.148507391 -0400
@@ -723,6 +723,7 @@
return 0;
 } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and
 $f_type == $Amanda::Header::F_TAPESTART) {
+   $label='' if !defined $label;
if (!match_labelstr($self->{'labelstr'}, $autolabel, $label,
$barcode, $meta, 
$self->{'chg'}->{'storage'}->{'storage_name'})) {
if (!$autolabel->{'other_config'}) {


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-13 Thread Chris Hoogendyk

Patched.

   amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily

   Amanda Tape Server Host Check
   -
   NOTE: Holding disk '/amanda3': 449998848 KB disk space available, using 
345141248 KB
   NOTE: Holding disk '/amanda4': 3026923520 KB disk space available, using 
2922065920 KB
   NOTE: Holding disk '/amanda5': 104857600 KB disk space available, using 
0 KB
   slot 19: volume 'Bio-Research-007'
   Will write to volume 'Bio-Research-007' in slot 19.
   NOTE: skipping tape-writable test
   Server check took 20.422 seconds
   (brought to you by Amanda 3.5.1)

   amanda@marlin:~/daily$

Here is the whole /tmp/amanda/server/daily/amcheck-device.20201013205425.debug

It's a little bit longer, going through the inventory more than once.

   Tue Oct 13 20:54:25.550354229 2020: pid 15972: thd-0x12d4800: 
amcheck-device: pid 15972 ruid 555
   euid 555 version 3.5.1: start at Tue Oct 13 20:54:25 2020
   Tue Oct 13 20:54:25.550685367 2020: pid 15972: thd-0x12d4800: 
amcheck-device: config_overrides:
   taperscan taper_lexical
   Tue Oct 13 20:54:25.550831476 2020: pid 15972: thd-0x12d4800: 
amcheck-device: reading config
   file /usr/local/etc/amanda/daily/amanda.conf
   Tue Oct 13 20:54:25.612658533 2020: pid 15972: thd-0x12d4800: 
amcheck-device: pid 15972 ruid 555
   euid 555 version 3.5.1: rename at Tue Oct 13 20:54:25 2020
   Tue Oct 13 20:54:25.643990167 2020: pid 15972: thd-0x12d4800: 
amcheck-device: chg-robot: using
   statefile 
'/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0'
   Tue Oct 13 20:54:25.662486365 2020: pid 15972: thd-0x12d4800: 
amcheck-device: invoking
   /usr/sbin/mtx -f /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 
status
   Tue Oct 13 20:54:26.295441930 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Changer /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 
Slots ( 0
   Import/Export )
   Tue Oct 13 20:54:26.314120367 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx: Data Transfer
   Element 0:Full (Storage Element 19 Loaded):VolumeTag = 07L7
   Tue Oct 13 20:54:26.314231179 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx: Data Transfer
   Element 1:Full (Storage Element 18 Loaded):VolumeTag = 06L7
   Tue Oct 13 20:54:26.314288072 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 1:Empty
   Tue Oct 13 20:54:26.314346671 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 2:Empty
   Tue Oct 13 20:54:26.314397525 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 3:Full :VolumeTag=15L7
   Tue Oct 13 20:54:26.314459720 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 4:Full :VolumeTag=16L7
   Tue Oct 13 20:54:26.314512714 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 5:Full :VolumeTag=17L7
   Tue Oct 13 20:54:26.314571527 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 6:Full :VolumeTag=14L7
   Tue Oct 13 20:54:26.314619709 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 7:Empty
   Tue Oct 13 20:54:26.314662419 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 8:Empty
   Tue Oct 13 20:54:26.314713070 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 9:Full :VolumeTag=21L7
   Tue Oct 13 20:54:26.314767855 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 10:Full :VolumeTag=18L7
   Tue Oct 13 20:54:26.314815482 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 11:Empty
   Tue Oct 13 20:54:26.314861223 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 12:Empty
   Tue Oct 13 20:54:26.314907435 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 13:Full :VolumeTag=01L7
   Tue Oct 13 20:54:26.314961246 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 14:Full :VolumeTag=13L7
   Tue Oct 13 20:54:26.315011854 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 15:Full :VolumeTag=03L7
   Tue Oct 13 20:54:26.315061469 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 16:Full :VolumeTag=04L7
   Tue Oct 13 20:54:26.315112449 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 17:Full :VolumeTag=05L7
   Tue Oct 13 20:54:26.315158231 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 18:Empty
   Tue Oct 13 20:54:26.315198333 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 19:Empty
   Tue Oct 13 20:54:26.315245593 2020: pid 15972: thd-0x12d4800: 
amcheck-device: mtx:   Storage
   Element 20:Full :VolumeTag=08L7
   Tue Oct 13 20:

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-13 Thread Nathan Stratton Treadway
On Tue, Oct 13, 2020 at 15:08:43 -0400, Chris Hoogendyk wrote:
> 
> End of /tmp/amanda/server/daily/amcheck-device.20201013150303.debug
[...]
>Tue Oct 13 15:03:04.326400287 2020: pid 10777: thd-0x2688800: 
> amcheck-device: warning: Use of
>uninitialized value $label in concatenation (.) or string at
>/usr/local/share/perl/5.22.1/Amanda/ScanInventory.pm line 687.
>Tue Oct 13 15:03:04.326421243 2020: pid 10777: thd-0x2688800: 
> amcheck-device:
>volume_is_labelable start: label:  barcode: 29L7
>Tue Oct 13 15:03:04.326444683 2020: pid 10777: thd-0x2688800: 
> amcheck-device:
>volume_is_labelable pre-matchlabel call

Okay, great, this would seem to confirm the theory that passing an
uninitialed $label value into the match_labelstr() function is what's
triggering the crash.

Here's a new patch to try.  I am not sure that it's really the
long-term-correct fix, but with some luck it will at least prevent the
crash you are currrently seeing and let you switch back to the oldest
taperscan.

You can either apply this patch file against the *original* version of
ScanInventory.pm, or just manually edit the previously-patched version
of the file to add the 
   $label='' if !defined $label;
just below the 
   debug("volume_is_labelable pre-matchlabel call");
line that appears in there now.

Given that a shot (and send the log file lines as usual)...


Nathan



Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
--- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400
+++ ScanInventory.pm2020-10-13 15:47:36.242054712 -0400
@@ -684,6 +684,7 @@
 my $chg = $self->{'chg'};
 my $autolabel = $chg->{'autolabel'};
 
+debug("volume_is_labelable start: label: $label barcode: $barcode");   
 if (!defined $dev_status) {
return 0;
 } elsif ($dev_status & $DEVICE_STATUS_VOLUME_UNLABELED and
@@ -723,8 +724,11 @@
return 0;
 } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and
 $f_type == $Amanda::Header::F_TAPESTART) {
+debug("volume_is_labelable pre-matchlabel call");   
+$label='' if !defined $label;
if (!match_labelstr($self->{'labelstr'}, $autolabel, $label,
$barcode, $meta, 
$self->{'chg'}->{'storage'}->{'storage_name'})) {
+debug("volume_is_labelable post-matchlabel call");   
if (!$autolabel->{'other_config'}) {
 #  $self->_user_msg(slot_result  => 1,
 #   label=> $label,
@@ -734,7 +738,9 @@
return 0;
}
} else {
+debug("volume_is_labelable pre-lookup_tapelabel  call");   
my $vol_tle = $self->{'tapelist'}->lookup_tapelabel($label);
+debug("volume_is_labelable post-lookup_tapelabel  call");   
if (!$vol_tle) {
 #  $self->_user_msg(slot_result => 1,
 #   label   => $label,


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-13 Thread Chris Hoogendyk

Patched.

   amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily

   Amanda Tape Server Host Check
   -
   NOTE: Holding disk '/amanda3': 449998848 KB disk space available, using 
345141248 KB
   NOTE: Holding disk '/amanda4': 3026923520 KB disk space available, using 
2922065920 KB
   NOTE: Holding disk '/amanda5': 104857600 KB disk space available, using 
0 KB
   ERROR: amcheck-device terminated with signal 11
   Server check took 1.883 seconds
   (brought to you by Amanda 3.5.1)

   amanda@marlin:~/daily$


End of /tmp/amanda/server/daily/amcheck-device.20201013150303.debug

   Tue Oct 13 15:03:04.314561876 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 33:Full :VolumeTag=36L7
   Tue Oct 13 15:03:04.314617462 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 34:Full :VolumeTag=37L7
   Tue Oct 13 15:03:04.314665487 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 35:Full :VolumeTag=38L7
   Tue Oct 13 15:03:04.314711679 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 36:Full :VolumeTag=39L7
   Tue Oct 13 15:03:04.314757827 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 37:Full :VolumeTag=40L7
   Tue Oct 13 15:03:04.314803076 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 38:Full :VolumeTag=41L7
   Tue Oct 13 15:03:04.314848723 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 39:Full :VolumeTag=42L7
   Tue Oct 13 15:03:04.314890084 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 40:Empty
   Tue Oct 13 15:03:04.314928126 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 41:Empty
   Tue Oct 13 15:03:04.314965283 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 42:Empty
   Tue Oct 13 15:03:04.315007303 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 43:Full :VolumeTag=43L7
   Tue Oct 13 15:03:04.315050910 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 44:Empty
   Tue Oct 13 15:03:04.315092078 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 45:Empty
   Tue Oct 13 15:03:04.315129499 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 46:Empty
   Tue Oct 13 15:03:04.315171329 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 47:Full :VolumeTag=CLN001CU
   Tue Oct 13 15:03:04.315218349 2020: pid 10777: thd-0x2688800: 
amcheck-device: mtx:   Storage
   Element 48:Full :VolumeTag=CLN002CU
   Tue Oct 13 15:03:04.315378917 2020: pid 10777: thd-0x2688800: 
amcheck-device: NEO200x48:
   updating state
   Tue Oct 13 15:03:04.322889915 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 0  label:
   Bio-Research-013ds: 0
   Tue Oct 13 15:03:04.324824228 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 1  label:
   Bio-Research-003ds: 0
   Tue Oct 13 15:03:04.324975904 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 2  label:
   Bio-Research-004ds: 0
   Tue Oct 13 15:03:04.325194015 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 3  label:
   Bio-Research-005ds: 0
   Tue Oct 13 15:03:04.325323140 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 4  label:
   Bio-Research-006ds: 0
   Tue Oct 13 15:03:04.325481141 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 5  label:
   Bio-Research-007ds: 0
   Tue Oct 13 15:03:04.325672484 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 6  label:
   Bio-Research-008ds: 0
   Tue Oct 13 15:03:04.325764902 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 7  label:
   Bio-Research-009ds: 0
   Tue Oct 13 15:03:04.325915539 2020: pid 10777: thd-0x2688800: 
amcheck-device: warning: Use of
   uninitialized value in concatenation (.) or string at
   /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.
   Tue Oct 13 15:03:04.325943596 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 8  label:
   Bio-Research-010ds:
   Tue Oct 13 15:03:04.326044285 2020: pid 10777: thd-0x2688800: amcheck-device:
   volume_is_labelable start: label: Bio-Research-010 barcode: 10L7
   Tue Oct 13 15:03:04.326096109 2020: pid 10777: thd-0x2688800: 
amcheck-device: warning: Use of
   uninitialized value in concatenation (.) or string at
   /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.
   Tue Oct 13 15:03:04.326128079 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 9  label:
   Bio-Research-011ds:
   Tue Oct 13 15:03:04.326151100 2020: pid 10777: thd-0x2688800: amcheck-device:
   volume_is_labelable start: label: Bio-Research-011 barcode: 11L7
   Tue Oct 13 15:03:04.326172195 2020: pid 10777: thd-0x2688800: 
amcheck-device: slot: 10  label:
   Bio-Research-012

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Nathan Stratton Treadway
On Mon, Oct 12, 2020 at 22:22:45 -0400, Chris Hoogendyk wrote:
>Mon Oct 12 22:16:21.857347044 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: slot: 9  label:
>Bio-Research-011ds:
>Mon Oct 12 22:16:21.857380544 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: slot: 10  label:
>Bio-Research-012ds: 0
>Mon Oct 12 22:16:21.857471810 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: slot: 11  label:
>Bio-Research-028ds: 0
>Mon Oct 12 22:16:21.857580226 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: warning: Use of
>uninitialized value in concatenation (.) or string at
>/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.
>Mon Oct 12 22:16:21.857607369 2020: pid 23996: thd-0x25c0800: 
> amcheck-device: slot: 12  label: ds: 0

Progress!

This shows that the crash happens during processing of slot 12.

(Looking back through the output of "amtape inventory" you sent, it
appears that this slot contains a tape with barcode 29L7.)


An interesting thing to note is that the *label* variable for that slot is
uninitialized -- perhaps that's what is causing the crash?

To test that theory a bit, I've attached another patch to try. 
Unfortunately this one is in a file used by the oldest.pm algorithm,
too, so you'll probably want to revert the file back to the original as
soon as you've finished teating, to make sure that the patched version
doesn't affect an actual amanda run.

So, basically save a copy of the original
   /usr/local/share/perl/5.22.1/Amanda/ScanInventory.pm 
file, then apply the patch attached to this email in-place to that file,
and run the 
  amcheck -s -otaperscan=taper_lexical daily
test again.  (I don't expect this patch to prevent the crash, but
hopefully the new log messages will narrow down exactly where it is crashing.)

Nathan



Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
--- ScanInventory.pm_orig_v3.5.12017-09-22 19:41:42.154305907 -0400
+++ ScanInventory.pm2020-10-13 00:53:04.662721210 -0400
@@ -684,6 +684,7 @@
 my $chg = $self->{'chg'};
 my $autolabel = $chg->{'autolabel'};
 
+debug("volume_is_labelable start: label: $label barcode: $barcode");   
 if (!defined $dev_status) {
return 0;
 } elsif ($dev_status & $DEVICE_STATUS_VOLUME_UNLABELED and
@@ -723,8 +724,10 @@
return 0;
 } elsif ($dev_status == $DEVICE_STATUS_SUCCESS and
 $f_type == $Amanda::Header::F_TAPESTART) {
+debug("volume_is_labelable pre-matchlabel call");   
if (!match_labelstr($self->{'labelstr'}, $autolabel, $label,
$barcode, $meta, 
$self->{'chg'}->{'storage'}->{'storage_name'})) {
+debug("volume_is_labelable post-matchlabel call");   
if (!$autolabel->{'other_config'}) {
 #  $self->_user_msg(slot_result  => 1,
 #   label=> $label,
@@ -734,7 +737,9 @@
return 0;
}
} else {
+debug("volume_is_labelable pre-lookup_tapelabel  call");   
my $vol_tle = $self->{'tapelist'}->lookup_tapelabel($label);
+debug("volume_is_labelable post-lookup_tapelabel  call");   
if (!$vol_tle) {
 #  $self->_user_msg(slot_result => 1,
 #   label   => $label,


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Chris Hoogendyk

   amanda@marlin:~/daily$ amtape -otaperscan=taper_oldest daily inventory

   slot 14: label Bio-Research-013 barcode 13L7 [retention-tapes]
   slot 15: label Bio-Research-003 barcode 03L7 [retention-tapes]
   slot 16: label Bio-Research-004 barcode 04L7 (in drive 1) 
[retention-tapes]
   slot 17: label Bio-Research-005 barcode 05L7 reserved (in drive 0) 
(current)
   [retention-tapes]
   slot 18: label Bio-Research-006 barcode 06L7 [retention-no]
   slot 19: label Bio-Research-007 barcode 07L7 [retention-no]
   slot 20: label Bio-Research-008 barcode 08L7 [retention-no]
   slot 21: label Bio-Research-009 barcode 09L7 [retention-no]
   slot 22: label Bio-Research-010 barcode 10L7 [retention-no]
   slot 23: label Bio-Research-011 barcode 11L7 [retention-no]
   slot 24: label Bio-Research-012 barcode 12L7 [retention-no]
   slot 25: label Bio-Research-028 barcode 28L7 [retention-no]
   slot 26: unknown barcode 29L7
   slot 27: unknown barcode 30L7
   slot 28: label Bio-Research-029 barcode 31L7 [retention-no]
   slot 29: label Bio-Research-030 barcode 32L7 [retention-no]
   slot 30: label Bio-Research-031 barcode 33L7 [retention-no]
   slot 31: label Bio-Research-032 barcode 34L7 [retention-no]
   slot 32: label Bio-Research-033 barcode 35L7 [retention-no]
   slot 33: label Bio-Research-034 barcode 36L7 [retention-no]
   slot 34: label Bio-Research-035 barcode 37L7 [retention-no]
   slot 35: unknown barcode 38L7
   slot 36: unknown barcode 39L7

   amanda@marlin:~/daily$

contents of /tmp/amanda/server/daily/amtape.2020101714.debug

   Mon Oct 12 22:27:14.283528244 2020: pid 25817: thd-0x26d2000: amtape: pid 
25817 ruid 555 euid
   555 version 3.5.1: start at Mon Oct 12 22:27:14 2020
   Mon Oct 12 22:27:14.283638808 2020: pid 25817: thd-0x26d2000: amtape: 
Arguments:
   -otaperscan=taper_oldest daily inventory
   Mon Oct 12 22:27:14.283985107 2020: pid 25817: thd-0x26d2000: amtape: 
config_overrides:
   taperscan taper_oldest
   Mon Oct 12 22:27:14.284166432 2020: pid 25817: thd-0x26d2000: amtape: 
reading config file
   /usr/local/etc/amanda/daily/amanda.conf
   Mon Oct 12 22:27:14.329169930 2020: pid 25817: thd-0x26d2000: amtape: pid 
25817 ruid 555 euid
   555 version 3.5.1: rename at Mon Oct 12 22:27:14 2020
   Mon Oct 12 22:27:14.368005788 2020: pid 25817: thd-0x26d2000: amtape: 
chg-robot: using statefile
   
'/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0'
   Mon Oct 12 22:27:14.370754856 2020: pid 25817: thd-0x26d2000: amtape: 
invoking /usr/sbin/mtx -f
   /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 status
   Mon Oct 12 22:27:15.015683640 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
Storage Changer
   /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 Slots ( 
0 Import/Export )
   Mon Oct 12 22:27:15.015801261 2020: pid 25817: thd-0x26d2000: amtape: mtx: 
Data Transfer Element
   0:Full (Storage Element 17 Loaded):VolumeTag = 05L7
   Mon Oct 12 22:27:15.015894617 2020: pid 25817: thd-0x26d2000: amtape: mtx: 
Data Transfer Element
   1:Full (Storage Element 16 Loaded):VolumeTag = 04L7
   Mon Oct 12 22:27:15.015950607 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   1:Empty
   Mon Oct 12 22:27:15.016009626 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   2:Empty
   Mon Oct 12 22:27:15.016061482 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   3:Full :VolumeTag=15L7
   Mon Oct 12 22:27:15.016120910 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   4:Full :VolumeTag=16L7
   Mon Oct 12 22:27:15.016175987 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   5:Full :VolumeTag=17L7
   Mon Oct 12 22:27:15.016229741 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   6:Full :VolumeTag=14L7
   Mon Oct 12 22:27:15.016275278 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   7:Empty
   Mon Oct 12 22:27:15.016332502 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   8:Empty
   Mon Oct 12 22:27:15.016384864 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   9:Full :VolumeTag=21L7
   Mon Oct 12 22:27:15.016436706 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   10:Full :VolumeTag=18L7
   Mon Oct 12 22:27:15.016483228 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   11:Empty
   Mon Oct 12 22:27:15.016524715 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   12:Empty
   Mon Oct 12 22:27:15.016570484 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storage Element
   13:Full :VolumeTag=01L7
   Mon Oct 12 22:27:15.016620407 2020: pid 25817: thd-0x26d2000: amtape: mtx:   
    Storag

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Chris Hoogendyk

hmm. Fixed double $$. Then

   amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily

   Amanda Tape Server Host Check
   -
   NOTE: Holding disk '/amanda3': 3011121152 KB disk space available, using 
2906263552 KB
   NOTE: Holding disk '/amanda4': 416890880 KB disk space available, using 
312033280 KB
   NOTE: Holding disk '/amanda5': 104857600 KB disk space available, using 
0 KB
   ERROR: amcheck-device terminated with signal 11
   Server check took 7.302 seconds
   (brought to you by Amanda 3.5.1)

   amanda@marlin:~/daily$

In /tmp/amanda/server/daily/amcheck-device.20201012221620.debug:

   Mon Oct 12 22:16:21.846884734 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 38:Full :VolumeTag=41L7
   Mon Oct 12 22:16:21.846931700 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 39:Full :VolumeTag=42L7
   Mon Oct 12 22:16:21.846974541 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 40:Empty
   Mon Oct 12 22:16:21.847013670 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 41:Empty
   Mon Oct 12 22:16:21.847052040 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 42:Empty
   Mon Oct 12 22:16:21.847095434 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 43:Full :VolumeTag=43L7
   Mon Oct 12 22:16:21.847139628 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 44:Empty
   Mon Oct 12 22:16:21.847178893 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 45:Empty
   Mon Oct 12 22:16:21.847217831 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 46:Empty
   Mon Oct 12 22:16:21.847261315 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 47:Full :VolumeTag=CLN001CU
   Mon Oct 12 22:16:21.847313575 2020: pid 23996: thd-0x25c0800: 
amcheck-device: mtx:   Storage
   Element 48:Full :VolumeTag=CLN002CU
   Mon Oct 12 22:16:21.847462771 2020: pid 23996: thd-0x25c0800: 
amcheck-device: NEO200x48:
   updating state
   Mon Oct 12 22:16:21.854842315 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 0  label:
   Bio-Research-013ds: 0
   Mon Oct 12 22:16:21.856260588 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 1  label:
   Bio-Research-003ds: 0
   Mon Oct 12 22:16:21.856415474 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 2  label:
   Bio-Research-004ds: 0
   Mon Oct 12 22:16:21.856638105 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 3  label:
   Bio-Research-005ds: 0
   Mon Oct 12 22:16:21.856765629 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 4  label:
   Bio-Research-006ds: 0
   Mon Oct 12 22:16:21.856956218 2020: pid 23996: thd-0x25c0800: 
amcheck-device: warning: Use of
   uninitialized value in concatenation (.) or string at
   /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.
   Mon Oct 12 22:16:21.856984955 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 5  label:
   Bio-Research-007ds:
   Mon Oct 12 22:16:21.857041299 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 6  label:
   Bio-Research-008ds: 0
   Mon Oct 12 22:16:21.857135202 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 7  label:
   Bio-Research-009ds: 0
   Mon Oct 12 22:16:21.857244733 2020: pid 23996: thd-0x25c0800: 
amcheck-device: warning: Use of
   uninitialized value in concatenation (.) or string at
   /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.
   Mon Oct 12 22:16:21.857270890 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 8  label:
   Bio-Research-010ds:
   Mon Oct 12 22:16:21.857321707 2020: pid 23996: thd-0x25c0800: 
amcheck-device: warning: Use of
   uninitialized value in concatenation (.) or string at
   /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.
   Mon Oct 12 22:16:21.857347044 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 9  label:
   Bio-Research-011ds:
   Mon Oct 12 22:16:21.857380544 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 10  label:
   Bio-Research-012ds: 0
   Mon Oct 12 22:16:21.857471810 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 11  label:
   Bio-Research-028ds: 0
   Mon Oct 12 22:16:21.857580226 2020: pid 23996: thd-0x25c0800: 
amcheck-device: warning: Use of
   uninitialized value in concatenation (.) or string at
   /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.
   Mon Oct 12 22:16:21.857607369 2020: pid 23996: thd-0x25c0800: 
amcheck-device: slot: 12  label: ds: 0

In the kern.log:

   Oct 12 22:16:21 marlin kernel: [2534284.886931] amcheck-device[23996]: 
segfault at 0 ip
   7f5cd7ddf7c6 sp 7ffc211d91f8 error 4 in 
libc-2.23.so[7f5cd7d54000+1c]


On 10/12/20 5:33 PM, Nathan Stratton Treadway wrote:

On Mon, Oct 12, 2020 at 15:27:44 -0400, Chris Hoogendy

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Nathan Stratton Treadway
On Mon, Oct 12, 2020 at 14:53:03 -0400, Chris Hoogendyk wrote:
> There should be no difference in the tapes. I did them all by just
> doing an `amtape daily slot nn` followed by an `amlabel daily
> Bio-Research-nnn`. The first 20 or so were all done in sequence in
> one session, and that would include the four you mention. I didn't
> even retype the commands. I used the up arrow twice to pull up the
> previous command, backspaced the number and typed a new number for
> the slot or for the tape label as appropriate.

Based on your description of when the problems started, I'm guessing the
issue is not in how the tapes were originally labeled but some fluke of
how they were used after that.

> The new tapes that I put in with the new magazines were labeled in
> the same way. Those are now out of the library, and the tapes that
> had originally been in the library were returned. That is when the
> problem occurred.

So, what I'm wondering is if there is any pattern to which tape labels
tie to the tapes used in the "new magazine" (and thus now no longer
actually in the library) and which were in the "removed and later
returned" category?

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Nathan Stratton Treadway
On Mon, Oct 12, 2020 at 15:27:44 -0400, Chris Hoogendyk wrote:
>amcheck-device: Not a SCALAR reference at
>/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.

Ah, drat, this was a typo in my patch.  Please edit line 102 of that
file and remove the doubled "$" character, i.e. 
   $$sl->{device_status}
should be
   $sl->{device_status}

Then retry the amcheck test and see if gets any farther along.

Nathan



p.s. if there are any actual Perl programmers left on this list, feel
free to jump in and point us in the right direction here


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Nathan Stratton Treadway
On Mon, Oct 12, 2020 at 18:19:21 +, Debra S Baddorf wrote:
> Is it worth trying to just remove the ???state???  file  (rename it to  
> .save  for instance)
> and letting amanda recreate it?

Hmmm, that's an intersting idea... though it seems (from reading the man
pages) that it might be possilbe to do this using the "amtape reset"
command rather than deleting the state file directly,


Chris, in the mean time, what to you get from the commands:

  amtape daily inventory

  amtape daily taper 

(For now we'll just let it default to oldest.pm, until we fix the patch
for lexical.pm.)

Both the output to the terminal session an the log files will probably
be useful.


Nathan



Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Chris Hoogendyk

Patch applied. Then  . .  .

   amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily

   Amanda Tape Server Host Check
   -
   WARNING: holding disk '/amanda3': only 104849408 KB free, using nothing
   WARNING: Not enough free space specified in amanda.conf
   NOTE: Holding disk '/amanda4': 137469952 KB disk space available, using 
32612352 KB
   NOTE: Holding disk '/amanda5': 104857600 KB disk space available, using 
0 KB
   amcheck-device: Not a SCALAR reference at
   /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.

   Server check took 0.965 seconds
   (brought to you by Amanda 3.5.1)

   amanda@marlin:~/daily$ ls -lt /tmp/amanda/server/daily/ | head

So, it didn't work (didn't load a tape and check the label), but it didn't crash and there were no 
entries in the kern.log.


Following is the end of 
/tmp/amanda/server/daily/amcheck-device.20201012151659.debug

   Mon Oct 12 15:17:00.503496521 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 39:Full :VolumeTag=42L7
   Mon Oct 12 15:17:00.503538735 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 40:Empty
   Mon Oct 12 15:17:00.503577116 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 41:Empty
   Mon Oct 12 15:17:00.503615219 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 42:Empty
   Mon Oct 12 15:17:00.503657949 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 43:Full :VolumeTag=43L7
   Mon Oct 12 15:17:00.503701295 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 44:Empty
   Mon Oct 12 15:17:00.503739869 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 45:Empty
   Mon Oct 12 15:17:00.503777939 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 46:Empty
   Mon Oct 12 15:17:00.503820736 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 47:Full :VolumeTag=CLN001CU
   Mon Oct 12 15:17:00.503867625 2020: pid 15320: thd-0x1df0800: 
amcheck-device: mtx:   Storage
   Element 48:Full :VolumeTag=CLN002CU
   Mon Oct 12 15:17:00.504012672 2020: pid 15320: thd-0x1df0800: 
amcheck-device: NEO200x48:
   updating state
   Mon Oct 12 15:17:00.508566057 2020: pid 15320: thd-0x1df0800: 
amcheck-device: critical (fatal):
   Not a SCALAR reference at 
/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm line 102.

   /usr/local/lib/amanda/libamanda-3.5.1.so(+0x398c7)[0x7fa8696158c7]
   /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_logv+0x1f4)[0x7fa869314a44]
   /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_log+0x8f)[0x7fa869314c6f]
   
/usr/local/share/perl/5.22.1/auto/Amanda/MainLoop/libMainLoop.so(+0x45b7)[0x7fa8659085b7]
   /lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x4aab3)[0x7fa86930eab3]
   
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_main_context_dispatch+0x15a)[0x7fa86930e04a]
   /lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x4a3f0)[0x7fa86930e3f0]
   
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_main_context_iteration+0x2c)[0x7fa86930e49c]
   /usr/local/lib/amanda/libamanda-3.5.1.so(+0x3b291)[0x7fa869617291]
   
/usr/local/share/perl/5.22.1/auto/Amanda/MainLoop/libMainLoop.so(_wrap_run_c+0x53)[0x7fa865908d93]
   /usr/bin/perl(Perl_pp_entersub+0x48a)[0x4bd5fa]
   /usr/bin/perl(Perl_runops_standard+0x16)[0x4b64a6]
   /usr/bin/perl(perl_run+0x2f9)[0x443bc9]
   /usr/bin/perl(main+0x12b)[0x41cb2b]
   /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fa869efc840]
   /usr/bin/perl(_start+0x29)[0x41cb69]


On 10/11/20 7:04 PM, Nathan Stratton Treadway wrote:

On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote:

In any case, I currently have both systems flushing to tape using
the traditional taper scan. That may work for now, but it would be
good to track this down. It's also puzzling that it just turned up

R.e. tracking this down: my thought is to see if we can track down the
problem by tweaking the code for lexical.pm -- that way, you can trigger
test runs using "amcheck -otaperscan=taper_lexical" without running the
risk of having a normal cron job attempt to run the code you are in the
middle of modifying.

I've attached a patch which, hopefully, will both fix the
"uninitialized" warning messages that have been appearing in the logs
and also print some debugging info as it loops through the tape inventory
so we can see if it dies in the middle of that loop.

So, when you are ready to investigate further, save a copy of the
original /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm, then
apply the patch to the "live" lexical.pm file, and then try running
amcheck -s (with lexical scan) again.

You may have to fix unblanced quotes or whatever typos in the patched
lines, but hopefully you'll soon get a log file which lists the tapes in
the inventory (but doesn't have the warning lines a

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Chris Hoogendyk
There should be no difference in the tapes. I did them all by just doing an `amtape daily slot nn` 
followed by an `amlabel daily Bio-Research-nnn`. The first 20 or so were all done in sequence in one 
session, and that would include the four you mention. I didn't even retype the commands. I used the 
up arrow twice to pull up the previous command, backspaced the number and typed a new number for the 
slot or for the tape label as appropriate.


The new tapes that I put in with the new magazines were labeled in the same way. Those are now out 
of the library, and the tapes that had originally been in the library were returned. That is when 
the problem occurred.


We're planning on doing the magazine switch once every three weeks and swapping out to Iron 
Mountain. That's a strategy based on what is convenient given what a magazine will hold, the fact 
that I am retired and mostly working remotely part time, and there is a pandemic such that I want to 
avoid going on campus and encountering students. If things get back to normal, they may get around 
to hiring a replacement for me and then that person may come up with a different strategy.



On 10/12/20 12:03 AM, Nathan Stratton Treadway wrote:

On Sun, Oct 11, 2020 at 23:32:08 -0400, Chris Hoogendyk wrote:

Text file.

attached.

Excellent, perfect.

If you are able to run the patched version of lexical.pm that should
give more explicit info, but meanwhile just looking through the
statefile: one thing that jumps out at me is that four of the slot
entries have Math::BigInt device_status fields, rather than simple
integers:

   Bio-Research-004, Bio-Research-001, Bio-Research-013, Bio-Research-014

Do those four volumes ring a bell with you as being special in some way?

(I wonder if the segfault might be related to the program trying to do
some operation against a  BigInt object when an integer is expected, or
something)

More generally, is there any pattern to the labels you used for your
"normal" tapes v.s. the short-term ones you wrote and then sent to Iron
Mountain?

Nathan



Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


--
---

Chris Hoogendyk

-
   O__   Systems Administrator, Retired
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst



---

Erdös 4



Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-12 Thread Debra S Baddorf
Is it worth trying to just remove the “state”  file  (rename it to  
.save  for instance)
and letting amanda recreate it?

When my label reader glitches and reads the label differently than it read it 
last time I used this tape,
I just remove my state file and it proceeds to recreate the whole file.   If 
the tapelist file includes the different data,
I edit that.   Not the tape name, but the reading of the tape label.

However,  this wouldn’t find the problem in the code.


Deb Baddorf

my config, fwiw:
amanda 3.3.8
Linux
HP Storever MSL4048  tape library
taperscan "taper_lexical" 
define changer LTO5-Robot  {
tpchanger "chg-robot:/dev/changer"
changerfile "/usr/local/etc/amanda/LTO5-Robot-state”
## <<— this file is deletable
property "tape-device" "0=/dev/nst0"   # 1=/dev/nst1  is for archive 
config
property "eject-before-unload"  "no"
property "use-slots"  "1-44"# 45-48 are for archive config
property "load-poll" "0 s poll 3 s until 120 s"
property "fast-search"  "true" # has barcode reader
device-property "BLOCK_SIZE"  "512k"   # from man amanda-changers.7
}



> On Oct 11, 2020, at 11:03 PM, Nathan Stratton Treadway  
> wrote:
> 
> On Sun, Oct 11, 2020 at 23:32:08 -0400, Chris Hoogendyk wrote:
>> Text file.
>> 
>> attached.
> 
> Excellent, perfect.
> 
> If you are able to run the patched version of lexical.pm that should
> give more explicit info, but meanwhile just looking through the
> statefile: one thing that jumps out at me is that four of the slot
> entries have Math::BigInt device_status fields, rather than simple
> integers:
> 
>  Bio-Research-004, Bio-Research-001, Bio-Research-013, Bio-Research-014
> 
> Do those four volumes ring a bell with you as being special in some way?
> 
> (I wonder if the segfault might be related to the program trying to do
> some operation against a  BigInt object when an integer is expected, or
> something)
> 
> More generally, is there any pattern to the labels you used for your
> "normal" tapes v.s. the short-term ones you wrote and then sent to Iron
> Mountain?
> 
>   Nathan
> 
> 
> 
> Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
> Ray Ontko & Co.  -  Software consulting services  -   
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ontko.com_&d=DwIBAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=HMrKaRiCv4jddln9fLPIOw&m=P_r2VB2yCnPdMxqVrVUzafCnl-Z_6EEAfi_C3oKO2RA&s=q_wkXgEzNjOiVFPx9EV6tXGkcE_dc-NjOkEiUFHh7DE&e=
>  
> GPG Key: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ontko.com_-7Enathanst_gpg-5Fkey.txt&d=DwIBAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=HMrKaRiCv4jddln9fLPIOw&m=P_r2VB2yCnPdMxqVrVUzafCnl-Z_6EEAfi_C3oKO2RA&s=y02sMuUPvmfmvo7vUKar5sB1x9yF6JRD7Obyu_G52a0&e=
> ID: 1023D/ECFB6239
> Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239




Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway
On Sun, Oct 11, 2020 at 23:32:08 -0400, Chris Hoogendyk wrote:
> Text file.
> 
> attached.

Excellent, perfect.

If you are able to run the patched version of lexical.pm that should
give more explicit info, but meanwhile just looking through the
statefile: one thing that jumps out at me is that four of the slot
entries have Math::BigInt device_status fields, rather than simple
integers:

  Bio-Research-004, Bio-Research-001, Bio-Research-013, Bio-Research-014

Do those four volumes ring a bell with you as being special in some way?

(I wonder if the segfault might be related to the program trying to do
some operation against a  BigInt object when an integer is expected, or
something)

More generally, is there any pattern to the labels you used for your
"normal" tapes v.s. the short-term ones you wrote and then sent to Iron
Mountain?

Nathan



Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Chris Hoogendyk

Thank you!

I'll have to get to that tomorrow.


On 10/11/20 7:04 PM, Nathan Stratton Treadway wrote:

On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote:

In any case, I currently have both systems flushing to tape using
the traditional taper scan. That may work for now, but it would be
good to track this down. It's also puzzling that it just turned up

R.e. tracking this down: my thought is to see if we can track down the
problem by tweaking the code for lexical.pm -- that way, you can trigger
test runs using "amcheck -otaperscan=taper_lexical" without running the
risk of having a normal cron job attempt to run the code you are in the
middle of modifying.

I've attached a patch which, hopefully, will both fix the
"uninitialized" warning messages that have been appearing in the logs
and also print some debugging info as it loops through the tape inventory
so we can see if it dies in the middle of that loop.

So, when you are ready to investigate further, save a copy of the
original /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm, then
apply the patch to the "live" lexical.pm file, and then try running
amcheck -s (with lexical scan) again.

You may have to fix unblanced quotes or whatever typos in the patched
lines, but hopefully you'll soon get a log file which lists the tapes in
the inventory (but doesn't have the warning lines any more)... at which
point, send me those log lines

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


--
---

Chris Hoogendyk

-
   O__   Systems Administrator, Retired
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst



---

Erdös 4



Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Chris Hoogendyk

This segment?

   define changer NEO200x48 {
   tpchanger 
"chg-robot:/dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0" # your 
changer
   device file (T48)
   property "tape-device" "0=tape:/dev/nst1"    # your tape drive device 
file
   property append "tape-device" "1=tape:/dev/nst2" # your tape drive 
device file
   property "use-slots" "14-36"  # UPDATE WHEN TAPES ADDED 
<
   }

   amrecover-changer "NEO200x48"   # amrecover will use the changer if you 
restore
    # from this device. It could be a string like 'changer' and amrecover 
will use your
    # changer if you set your tape to 'changer' with 'setdevice changer' or 
via
    # 'tapedev "changer"' in amanda-client.conf

   define policy research {

   retention-tapes 30

   }

   define storage research {

   tpchanger "NEO200x48"

   tapetype "LTO7"

   policy "research"

   eject-volume yes

   labelstr "^Bio-Research-[0-9][0-9][0-9]*$"

   tapepool "research"

   dump-selection "research" ALL

   }


On 10/11/20 3:51 PM, Nathan Stratton Treadway wrote:

On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote:

I'm not sure what I should be looking for.

I don't see anything in amanda's home directory that seems likely,
nor in /tmp/amanda, and there is no /etc/amanda/. The tape library
has a web interface that I use to interact with it. Amanda is
configured to use mtx. I can also use mtx by hand to check on the
library's status, and I have a script amchanger that I wrote that
does that for me.

So, aside from amanda keeping an inventory, and the tapelist that it
has, I'm not sure where else anything would be.

My theory is that the driver for the changer keeps the inventory written
in a file somewhere specific to the changer, but since I don't have a
tape changer myself I am not familiar with that driver and don't know
where to direct you to look, off hand

Can you post the changer and tape-drive related parameters/sections from
your amanda.conf file?  (I assume the config sections on your two
different servers are essentially the same, right?)

Nsthan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


--
---

Chris Hoogendyk

-
   O__   Systems Administrator, Retired
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst



---

Erdös 4



Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway
On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote:
> In any case, I currently have both systems flushing to tape using
> the traditional taper scan. That may work for now, but it would be
> good to track this down. It's also puzzling that it just turned up

R.e. tracking this down: my thought is to see if we can track down the
problem by tweaking the code for lexical.pm -- that way, you can trigger
test runs using "amcheck -otaperscan=taper_lexical" without running the
risk of having a normal cron job attempt to run the code you are in the
middle of modifying.

I've attached a patch which, hopefully, will both fix the
"uninitialized" warning messages that have been appearing in the logs
and also print some debugging info as it loops through the tape inventory
so we can see if it dies in the middle of that loop.

So, when you are ready to investigate further, save a copy of the
original /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm, then
apply the patch to the "live" lexical.pm file, and then try running
amcheck -s (with lexical scan) again.

You may have to fix unblanced quotes or whatever typos in the patched
lines, but hopefully you'll soon get a log file which lists the tapes in
the inventory (but doesn't have the warning lines any more)... at which
point, send me those log lines

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239
--- lexical.pm_orig_v3.5.1  2020-10-11 18:23:37.650643033 -0400
+++ lexical.pm  2020-10-11 18:47:47.097749555 -0400
@@ -97,6 +97,10 @@
 
 for my $i (0..(scalar(@$inventory)-1)) {
my $sl = $inventory->[$i];
+
+   # tracing segfault:
+debug("slot: $i  label: " .  $sl->{'label'} . "ds: " . 
$$sl->{device_status});
+
next if $seen->{$sl->{slot}};
 
if (!defined $sl->{'state'} ||
@@ -104,6 +108,7 @@
push @unknown, $sl
} elsif ($sl->{'state'} == Amanda::Changer::SLOT_EMPTY) {
} elsif (defined $sl->{'label'} &&
+defined $sl->{device_status} &&
 $sl->{device_status} == $DEVICE_STATUS_SUCCESS) {
if ($self->is_reusable_volume(label => $sl->{'label'})) {
if ($last_label && $sl->{'label'} gt $last_label) {


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway
On Sun, Oct 11, 2020 at 15:51:34 -0400, Nathan Stratton Treadway wrote:
> My theory is that the driver for the changer keeps the inventory written
> in a file somewhere specific to the changer, but since I don't have a

Okay, looking back through the log file you posted a couple days ago,
it looks like the file in question is found at 

/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0


If I'm reading the driver correctly, that should be a text file
(perl-formated data definitions)... so can you post the contents here? 
(Or, if it seems too large, send it to me off-list.)

(It's atuaally called "statefile", but the robot changer seems to
include the inventory information as part of the changer state. )

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway
On Sun, Oct 11, 2020 at 14:48:13 -0400, Chris Hoogendyk wrote:
> I'm not sure what I should be looking for.
> 
> I don't see anything in amanda's home directory that seems likely,
> nor in /tmp/amanda, and there is no /etc/amanda/. The tape library
> has a web interface that I use to interact with it. Amanda is
> configured to use mtx. I can also use mtx by hand to check on the
> library's status, and I have a script amchanger that I wrote that
> does that for me.
> 
> So, aside from amanda keeping an inventory, and the tapelist that it
> has, I'm not sure where else anything would be.

My theory is that the driver for the changer keeps the inventory written
in a file somewhere specific to the changer, but since I don't have a
tape changer myself I am not familiar with that driver and don't know
where to direct you to look, off hand

Can you post the changer and tape-drive related parameters/sections from
your amanda.conf file?  (I assume the config sections on your two
different servers are essentially the same, right?)

Nsthan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Chris Hoogendyk

I'm not sure what I should be looking for.

I don't see anything in amanda's home directory that seems likely, nor in /tmp/amanda, and there is 
no /etc/amanda/. The tape library has a web interface that I use to interact with it. Amanda is 
configured to use mtx. I can also use mtx by hand to check on the library's status, and I have a 
script amchanger that I wrote that does that for me.


So, aside from amanda keeping an inventory, and the tapelist that it has, I'm not sure where else 
anything would be.


In any case, I currently have both systems flushing to tape using the traditional taper scan. That 
may work for now, but it would be good to track this down. It's also puzzling that it just turned up 
as a problem AND on both Amanda servers. I had been using the same setup on both servers for quite 
some time. A few weeks ago I swapped in some new magazines and labeled all the new tapes with 
amlabel. Those ran through getting backups. Then this Friday night I swapped the new ones out and 
put the original magazines back in. That's when the trouble started.



On 10/11/20 2:17 PM, Nathan Stratton Treadway wrote:

On Sun, Oct 11, 2020 at 12:24:45 -0400, Chris Hoogendyk wrote:

/tmp/amanda/server/daily/amcheck-device.20201011120931.debug (lexical)
Sun Oct 11 12:09:31.912816794 2020: pid 22002: thd-0xe8e600: amcheck-device: 
NEO200x48: updating state
Sun Oct 11 12:09:31.922463203 2020: pid 22002: thd-0xe8e600:
amcheck-device: warning: Use of uninitialized value in numeric eq
(==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm
line 106.

Excellent, this points towards the "tape inventory" part of the code.

Can you take a look around your system to see if you can find where the
tape changer stores inventory information, internally?  (If you don't
find it immediately I can look in the source to try to figure out the
path it would use, but hopefully its easy enough to figure outjust
looking thorugh amanda-related directories on your system.)  If we can
find that file, it may be able to see some "wierd" data that could be
causing a crash by looking at the file directly.


Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


--
---

Chris Hoogendyk

-
   O__   Systems Administrator, Retired
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst



---

Erdös 4



Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Nathan Stratton Treadway
On Sun, Oct 11, 2020 at 12:24:45 -0400, Chris Hoogendyk wrote:
> /tmp/amanda/server/daily/amcheck-device.20201011120931.debug (lexical)

> Sun Oct 11 12:09:31.912816794 2020: pid 22002: thd-0xe8e600: amcheck-device: 
> NEO200x48: updating state
> Sun Oct 11 12:09:31.922463203 2020: pid 22002: thd-0xe8e600:
> amcheck-device: warning: Use of uninitialized value in numeric eq
> (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm
> line 106.

Excellent, this points towards the "tape inventory" part of the code.

Can you take a look around your system to see if you can find where the
tape changer stores inventory information, internally?  (If you don't
find it immediately I can look in the source to try to figure out the
path it would use, but hopefully its easy enough to figure outjust
looking thorugh amanda-related directories on your system.)  If we can
find that file, it may be able to see some "wierd" data that could be
causing a crash by looking at the file directly.


Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Chris Hoogendyk

/tmp/amanda/server/daily/amcheck-device.20201011120931.debug (lexical)

Sun Oct 11 12:09:31.912123219 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 35:Full :VolumeTag=38L7
Sun Oct 11 12:09:31.912170629 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 36:Full :VolumeTag=39L7
Sun Oct 11 12:09:31.912217715 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 37:Full :VolumeTag=40L7
Sun Oct 11 12:09:31.912264266 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 38:Full :VolumeTag=41L7
Sun Oct 11 12:09:31.912313021 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 39:Full :VolumeTag=42L7
Sun Oct 11 12:09:31.912354905 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 40:Empty
Sun Oct 11 12:09:31.912392783 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 41:Empty
Sun Oct 11 12:09:31.912430299 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 42:Empty
Sun Oct 11 12:09:31.912472391 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 43:Full :VolumeTag=43L7
Sun Oct 11 12:09:31.912515070 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 44:Empty
Sun Oct 11 12:09:31.912552958 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 45:Empty
Sun Oct 11 12:09:31.912590453 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 46:Empty
Sun Oct 11 12:09:31.912632567 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 47:Full :VolumeTag=CLN001CU
Sun Oct 11 12:09:31.912678933 2020: pid 22002: thd-0xe8e600: amcheck-device: mtx:   Storage 
Element 48:Full :VolumeTag=CLN002CU

Sun Oct 11 12:09:31.912816794 2020: pid 22002: thd-0xe8e600: amcheck-device: 
NEO200x48: updating state
Sun Oct 11 12:09:31.922463203 2020: pid 22002: thd-0xe8e600: amcheck-device: warning: Use of 
uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm 
line 106.
Sun Oct 11 12:09:31.922577733 2020: pid 22002: thd-0xe8e600: amcheck-device: warning: Use of 
uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm 
line 106.
Sun Oct 11 12:09:31.922630514 2020: pid 22002: thd-0xe8e600: amcheck-device: warning: Use of 
uninitialized value in numeric eq (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/lexical.pm 
line 106.



/tmp/amanda/server/daily/amcheck-device.20201011121015.debug (traditional)

Sun Oct 11 12:10:16.358496965 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 33:Full :VolumeTag=36L7
Sun Oct 11 12:10:16.358551835 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 34:Full :VolumeTag=37L7
Sun Oct 11 12:10:16.358600321 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 35:Full :VolumeTag=38L7
Sun Oct 11 12:10:16.358647199 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 36:Full :VolumeTag=39L7
Sun Oct 11 12:10:16.358693381 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 37:Full :VolumeTag=40L7
Sun Oct 11 12:10:16.358739526 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 38:Full :VolumeTag=41L7
Sun Oct 11 12:10:16.358785636 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 39:Full :VolumeTag=42L7
Sun Oct 11 12:10:16.358831765 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 40:Empty
Sun Oct 11 12:10:16.358873441 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 41:Empty
Sun Oct 11 12:10:16.358911735 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 42:Empty
Sun Oct 11 12:10:16.358954276 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 43:Full :VolumeTag=43L7
Sun Oct 11 12:10:16.358999694 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 44:Empty
Sun Oct 11 12:10:16.359037603 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 45:Empty
Sun Oct 11 12:10:16.359074776 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 46:Empty
Sun Oct 11 12:10:16.359116468 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 47:Full :VolumeTag=CLN001CU
Sun Oct 11 12:10:16.359165851 2020: pid 22053: thd-0x8d3600: amcheck-device: mtx:   Storage 
Element 48:Full :VolumeTag=CLN002CU

Sun Oct 11 12:10:16.35930 2020: pid 22053: thd-0x8d3600: amcheck-device: 
NEO200x48: updating state
Sun Oct 11 12:10:16.361076728 2020: pid 22053: thd-0x8d3600: amcheck-device: NEO200x48: loading 
label 'Bio-Research-002'
Sun Oct 11 12:10:16.361442086 2020: pid 22053: thd-0x8d3600: amcheck-device: new 
Amanda::Changer::Error: type='failed', reason='notfound'

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-11 Thread Chris Hoogendyk

This morning:

   amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_lexical daily

   Amanda Tape Server Host Check
   -
   NOTE: Holding disk '/amanda3': 253538304 KB disk space available, using 
148680704 KB
   WARNING: holding disk '/amanda4': only 104853504 KB free, using nothing
   WARNING: Not enough free space specified in amanda.conf
   NOTE: Holding disk '/amanda5': 613318656 KB disk space available, using 
508461056 KB
   ERROR: amcheck-device terminated with signal 11
   Server check took 1.505 seconds
   (brought to you by Amanda 3.5.1)

   amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily

   Amanda Tape Server Host Check
   -
   NOTE: Holding disk '/amanda3': 253538304 KB disk space available, using 
148680704 KB
   WARNING: holding disk '/amanda4': only 104853504 KB free, using nothing
   WARNING: Not enough free space specified in amanda.conf
   NOTE: Holding disk '/amanda5': 613318656 KB disk space available, using 
508461056 KB
   Searching for label 'Bio-Research-002':label 'Bio-Research-002' not 
recognized or not found
   slot 15: volume 'Bio-Research-003'
   Will write to volume 'Bio-Research-003' in slot 15.
   NOTE: skipping tape-writable test
   Server check took 19.306 seconds
   (brought to you by Amanda 3.5.1)

   amanda@marlin:~/daily$

I'll look at logs and post those if I see anything.


On 10/11/20 12:58 AM, Nathan Stratton Treadway wrote:

On Sat, Oct 10, 2020 at 23:50:17 -0400, Chris Hoogendyk wrote:

Wow!

amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily
Amanda Tape Server Host Check
-
NOTE: Holding disk '/amanda3': 913514496 KB disk space available, using 
808656896 KB
NOTE: Holding disk '/amanda4': 158228480 KB disk space available, using 
53370880 KB
NOTE: Holding disk '/amanda5': 1636618240 KB disk space available, using 
1531760640 KB
Searching for label 'Bio-Research-002':label 'Bio-Research-002' not 
recognized or not found
slot 13:slot 13 not in use-slots (14-36)
slot 14: volume 'Bio-Research-013' is still active and cannot be overwritten
slot 15: volume 'Bio-Research-003'
Will write to volume 'Bio-Research-003' in slot 15.
NOTE: skipping tape-writable test
Server check took 175.512 seconds
(brought to you by Amanda 3.5.1)
amanda@marlin:~/daily$

That worked! Interestingly, doing an `amcheck -s daily` after that
fails just as before. The amanda.conf uses taper_oldest.

Okay, this lends support to the theory that the crash is actually
happening in "scan" operation, rather than in some later part of the
amcheck-driver/taper process.
  
(Were there any error/warning messages written to the amcheck-device log

file for that run?)


So, maybe if I temporarily go to the different algorithm, it will
work. Right now the backups are already running and dumping to

Yeah, it might well work, and if so -- and if you don't care which
tape(s) are used next -- then simply switching to taper_traditional
would probably be the easiest approach to getting new dumps actually
written to tape


If the order the tapes are used does matter to you, I think it should
probably be possible to fix the bug in taper_oldest (oldest.pm) to get
it working (but I'm not really sure how much debugging effort it will
involve...).

If you are interested to attempt that, the next thing I would check is
to see what happens with -otaperscan=taper_lexical (assuming that is also
defined in your amanda.conf).  The "lexical" and "oldest" algorithms
both use the tape-drive inventory (while "traditional" does not), so
that test will help narrow the problem down to just "oldest" or to the
tape-inventory part of the code



Nathan




Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


--
---

Chris Hoogendyk

-
   O__   Systems Administrator, Retired
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst



---

Erdös 4



Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Nathan Stratton Treadway
On Sat, Oct 10, 2020 at 23:50:17 -0400, Chris Hoogendyk wrote:
> Wow!
> 
>amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily
>Amanda Tape Server Host Check
>-
>NOTE: Holding disk '/amanda3': 913514496 KB disk space available, using 
> 808656896 KB
>NOTE: Holding disk '/amanda4': 158228480 KB disk space available, using 
> 53370880 KB
>NOTE: Holding disk '/amanda5': 1636618240 KB disk space available, using 
> 1531760640 KB
>Searching for label 'Bio-Research-002':label 'Bio-Research-002' not 
> recognized or not found
>slot 13:slot 13 not in use-slots (14-36)
>slot 14: volume 'Bio-Research-013' is still active and cannot be 
> overwritten
>slot 15: volume 'Bio-Research-003'
>Will write to volume 'Bio-Research-003' in slot 15.
>NOTE: skipping tape-writable test
>Server check took 175.512 seconds
>(brought to you by Amanda 3.5.1)
>amanda@marlin:~/daily$
> 
> That worked! Interestingly, doing an `amcheck -s daily` after that
> fails just as before. The amanda.conf uses taper_oldest.

Okay, this lends support to the theory that the crash is actually
happening in "scan" operation, rather than in some later part of the
amcheck-driver/taper process.
 
(Were there any error/warning messages written to the amcheck-device log
file for that run?)

> So, maybe if I temporarily go to the different algorithm, it will
> work. Right now the backups are already running and dumping to

Yeah, it might well work, and if so -- and if you don't care which
tape(s) are used next -- then simply switching to taper_traditional
would probably be the easiest approach to getting new dumps actually
written to tape


If the order the tapes are used does matter to you, I think it should
probably be possible to fix the bug in taper_oldest (oldest.pm) to get
it working (but I'm not really sure how much debugging effort it will
involve...).

If you are interested to attempt that, the next thing I would check is
to see what happens with -otaperscan=taper_lexical (assuming that is also
defined in your amanda.conf).  The "lexical" and "oldest" algorithms
both use the tape-drive inventory (while "traditional" does not), so
that test will help narrow the problem down to just "oldest" or to the
tape-inventory part of the code  



Nathan




Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Chris Hoogendyk

Wow!

   amanda@marlin:~/daily$ amcheck -s -otaperscan=taper_traditional daily
   Amanda Tape Server Host Check
   -
   NOTE: Holding disk '/amanda3': 913514496 KB disk space available, using 
808656896 KB
   NOTE: Holding disk '/amanda4': 158228480 KB disk space available, using 
53370880 KB
   NOTE: Holding disk '/amanda5': 1636618240 KB disk space available, using 
1531760640 KB
   Searching for label 'Bio-Research-002':label 'Bio-Research-002' not 
recognized or not found
   slot 13:slot 13 not in use-slots (14-36)
   slot 14: volume 'Bio-Research-013' is still active and cannot be overwritten
   slot 15: volume 'Bio-Research-003'
   Will write to volume 'Bio-Research-003' in slot 15.
   NOTE: skipping tape-writable test
   Server check took 175.512 seconds
   (brought to you by Amanda 3.5.1)
   amanda@marlin:~/daily$

That worked! Interestingly, doing an `amcheck -s daily` after that fails just as before. The 
amanda.conf uses taper_oldest.


So, maybe if I temporarily go to the different algorithm, it will work. Right now the backups are 
already running and dumping to holding disk. That will likely terminate when out of space, and in 
the morning I can initiate a new run with the amanda.conf changed.



On 10/10/20 8:09 PM, Nathan Stratton Treadway wrote:

On Sat, Oct 10, 2020 at 19:20:25 -0400, Chris Hoogendyk wrote:

The taper log from last night looks exactly like the amcheck log,
showing the same inventory followed by the updating state and the
same three repetitions of the warning at line 102.

So you are saying that the taper log from last night goes up through the
lines that look like


Thu Oct 08 23:30:04.244060162 2020: pid 18920: thd-0x28b4800: taper: mtx:   
    Storage Element 48:Full :VolumeTag=CLN002CU
Thu Oct 08 23:30:04.244209614 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: updating state

, then switch to the warning lines from oldest.pm line 102, and then
aborts suddenly?

It's not getting you an actual solution, but I'm curious if changing to
a different taperscan algorithm in the amanda.conf file (or using
-otaperscan= , if you have any other ones defined already) allow
"amcheck -s" to complete sucessfully (and in any case what the amcheck
log file looks like with the another taperscan algorithm)

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


--
---

Chris Hoogendyk

-
   O__   Systems Administrator, Retired
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst



---

Erdös 4



Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Nathan Stratton Treadway
On Sat, Oct 10, 2020 at 19:20:25 -0400, Chris Hoogendyk wrote:
> The taper log from last night looks exactly like the amcheck log,
> showing the same inventory followed by the updating state and the
> same three repetitions of the warning at line 102.

So you are saying that the taper log from last night goes up through the
lines that look like

>Thu Oct 08 23:30:04.244060162 2020: pid 18920: thd-0x28b4800: taper: mtx:  
>  Storage Element 48:Full :VolumeTag=CLN002CU
>Thu Oct 08 23:30:04.244209614 2020: pid 18920: thd-0x28b4800: taper: 
> NEO200x48: updating state

, then switch to the warning lines from oldest.pm line 102, and then
aborts suddenly?

It's not getting you an actual solution, but I'm curious if changing to
a different taperscan algorithm in the amanda.conf file (or using
-otaperscan= , if you have any other ones defined already) allow
"amcheck -s" to complete sucessfully (and in any case what the amcheck
log file looks like with the another taperscan algorithm)

Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Chris Hoogendyk
The taper log from last night looks exactly like the amcheck log, showing the same inventory 
followed by the updating state and the same three repetitions of the warning at line 102.


The taper log from the night before, when it was successful, ends with this:

   Thu Oct 08 23:30:04.243540450 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   36:Full :VolumeTag=55L7
   Thu Oct 08 23:30:04.243588794 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   37:Full :VolumeTag=40L7
   Thu Oct 08 23:30:04.243636849 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   38:Full :VolumeTag=41L7
   Thu Oct 08 23:30:04.243684260 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   39:Full :VolumeTag=42L7
   Thu Oct 08 23:30:04.243727126 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   40:Empty
   Thu Oct 08 23:30:04.243765952 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   41:Empty
   Thu Oct 08 23:30:04.243804416 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   42:Empty
   Thu Oct 08 23:30:04.243847604 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   43:Full :VolumeTag=43L7
   Thu Oct 08 23:30:04.243891382 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   44:Empty
   Thu Oct 08 23:30:04.243931546 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   45:Empty
   Thu Oct 08 23:30:04.243969440 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   46:Empty
   Thu Oct 08 23:30:04.244012542 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   47:Full :VolumeTag=CLN001CU
   Thu Oct 08 23:30:04.244060162 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   48:Full :VolumeTag=CLN002CU
   Thu Oct 08 23:30:04.244209614 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: updating state
   Thu Oct 08 23:30:04.253017380 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: too early for
   another 'status' invocation
   Thu Oct 08 23:30:04.253371305 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: loading slot '4'
   Thu Oct 08 23:30:04.253422930 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: requested volume
   is already in drive 0
   Thu Oct 08 23:30:04.253728335 2020: pid 18920: thd-0x28b4800: taper: 
NEO200x48: polling
   'tape:/dev/nst1' to see if it's ready
   Thu Oct 08 23:30:04.256414884 2020: pid 18920: thd-0x28b4800: taper: Device 
is in variable block
   size
   Thu Oct 08 23:30:23.300747859 2020: pid 18920: thd-0x28b4800: taper: Slot 4 
with label
   Bio-Research-058 is usable
   Thu Oct 08 23:30:23.304473715 2020: pid 18920: thd-0x28b4800: taper: 
Amanda::Taper::Scribe
   preparing to write, part size 0, using no cache (PEOM will be fatal) 
(splitter)  (no LEOM)
   Thu Oct 08 23:30:23.304769712 2020: pid 18920: thd-0x28b4800: taper: Starting 
 -> )>
   Thu Oct 08 23:30:23.304819253 2020: pid 18920: thd-0x28b4800: taper: Final 
linkage:
-(MEM_RING)-> 
   Thu Oct 08 23:30:23.305432752 2020: pid 18920: thd-0x28b4800: taper: header 
native_crc:
   9458cc2f:496412180480
   Thu Oct 08 23:30:23.305451271 2020: pid 18920: thd-0x28b4800: taper: header 
client_crc:
   9458cc2f:496412180480
   Thu Oct 08 23:30:23.305465429 2020: pid 18920: thd-0x28b4800: taper: header 
server_crc:
   9458cc2f:496412180480
   Thu Oct 08 23:30:23.305552797 2020: pid 18920: thd-0x28b4800: taper: 
start_recovery called
   Thu Oct 08 23:30:23.325029323 2020: pid 18920: thd-0x28b4800: taper: 
Building type TAPESTART
   header of 524288-524288 bytes with name='Bio-Research-058' disk='' 
dumplevel=0 and blocksize=524288
   Thu Oct 08 23:30:31.666780210 2020: pid 18920: thd-0x28b4800: taper: 
invoking /usr/sbin/mtx -f
   /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 status
   Thu Oct 08 23:30:32.316054467 2020: pid 18920: thd-0x28b4800: taper: mtx:   
Storage Changer
   /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 Slots ( 
0 Import/Export )
   Thu Oct 08 23:30:32.316179280 2020: pid 18920: thd-0x28b4800: taper: mtx: 
Data Transfer Element
   0:Full (Storage Element 4 Loaded):VolumeTag = 58L7
   Thu Oct 08 23:30:32.316269484 2020: pid 18920: thd-0x28b4800: taper: mtx: 
Data Transfer Element
   1:Empty
   Thu Oct 08 23:30:32.316314827 2020: pid 18920: thd-0x28b4800: taper: mtx:
   Storage Element
   1:Empty

I got 6.8TB on tape the night the above log represents.

A successful amcheck-device log from a couple of days ago looks just like that.

There are no log files with a name that includes "change".


On 10/10/20 6:20 PM, Nathan Stratton Treadway wrote:

On Sat, Oct 10, 2020 at 17:11:16 -0400, Chris Hoogendyk wrote:

Also

8 lines like this in the kern.log:

Oct  9 21:05:58 eclogite kernel: [650814.343315] amcheck-device[5089]: 
segfault at 0 ip
7f94159617c6 sp 7fff61039da8 error 4 in 
libc-2.23.so[7f94158d6000+1c]


Well, that does

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Nathan Stratton Treadway
On Sat, Oct 10, 2020 at 17:11:16 -0400, Chris Hoogendyk wrote:
> Also
> 
> 8 lines like this in the kern.log:
> 
>Oct  9 21:05:58 eclogite kernel: [650814.343315] amcheck-device[5089]: 
> segfault at 0 ip
>7f94159617c6 sp 7fff61039da8 error 4 in 
> libc-2.23.so[7f94158d6000+1c]
> 

Well, that doesn't really tell us more more about what is going wrong,
other than the slight hint the problem is all the way down in the the
libc library somehow.

I don't know how much you are trying to investigate further at this
point in your furlough schedule... but I still feel that comparing the
log from a succesfull run to this aborted run has the best chance of
generating a hint as to exactly what operation is underway at the point
of failure.


Also, the taper/changer logs from last night's run should give some hint
as to what it was attempting, and perhaps those logs will be different
enough from the amcheck-device logs that it'll give some new
information

(The last operation that appears to be happening in your quoted
amcheck-device log lines is a scan through the tape-changer inventory. 
I don't have a physical tape-drive changer myself so I don't have any
guesses as to what could be wrong, but based on the history of the
situation you described it does seem plausible that the inventory
database it's working from could contain some "unexpected" data of some
sort...)

Nathan



Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Chris Hoogendyk
To follow up. There was no data written to tape. On one system, amstatus indicated that it was 
trying to locate a tape, but that never succeeded.


Both systems backed up to holding disk. I have some space left on holding disk, but not enough to 
get beyond tonight, and tonight is likely to fall baqck to incrementals only.



On 10/9/20 10:22 PM, Chris Hoogendyk wrote:
I have two servers doing Amanda backups for two different departments. They are both Supermicro 
servers running Ubuntu 16.04 with fairly recent patches and Amanda 3.5.1.


This evening I came in to swap magazines in the Overland NeoSeries tape libraries. We have just 
begun a program to send a set of magazines out to Iron Mountain. The libraries had new magazines 
with new LTO7 tapes that I had labeled with amlabel and then run through enough days of backups to 
write them all. Now I'm taking those out, putting back in the original magazines, and sending the 
new ones out to Iron Mountain. So, part of the point of saying all that is that I have done this 
magazine swap previously. Once. Now, this evening, on both servers, I'm getting this error (ERROR: 
amcheck-device terminated with signal 11). When I ran the amcheck before swapping magazines, it 
worked. I then did an `amtape daily slot 2` to get it pointed at the beginning, went to the web 
interface and unloaded that tape back into the magazine, swapped the magazines, and went back to 
amanda.


`amtape daily slot whatever` works. It returns telling me the label on the tape, and the web 
interface shows it loaded.


`amlabel daily somelabel` works. It looks at the tape, reads the label, tells me that it is 
already labeled and in use.


`amcheck daily` fails.

Any clues? I need to resolve this fairly quickly. The University is doing mandatory furloughs to 
deal with their budget shortfalls, and I'm scheduled for a one week furlough starting Sunday. I 
will update this later with information on whether the backups that are scheduled to start shortly 
end up working. I presume they will at least go to holding disk, but whether they will write to 
the tape is the question.



Following is the contents of the 
/tmp/amanda/server/daily/amcheck-device.20201009215709.debug:

Fri Oct 09 21:57:09.046919855 2020: pid 24239: thd-0xc2e600: amcheck-device: pid 24239 ruid 555 
euid 555 version 3.5.1: start at Fri Oct  9 21:57:09 2020
Fri Oct 09 21:57:09.047288267 2020: pid 24239: thd-0xc2e600: amcheck-device: reading config file 
/usr/local/etc/amanda/daily/amanda.conf
Fri Oct 09 21:57:09.076905220 2020: pid 24239: thd-0xc2e600: amcheck-device: pid 24239 ruid 555 
euid 555 version 3.5.1: rename at Fri Oct  9 21:57:09 2020
Fri Oct 09 21:57:09.105544741 2020: pid 24239: thd-0xc2e600: amcheck-device: chg-robot: using 
statefile '/usr/local/var/amanda/chg-robot-dev-tape-by-id-scsi-1BDT-FlexStor-II-00MX64200626-LL0'
Fri Oct 09 21:57:09.124493185 2020: pid 24239: thd-0xc2e600: amcheck-device: invoking 
/usr/sbin/mtx -f /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0 status
Fri Oct 09 21:57:09.793516825 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Changer /dev/tape/by-id/scsi-1BDT_FlexStor_II_00MX64200626_LL0:2 Drives, 48 Slots ( 0 Import/Export )
Fri Oct 09 21:57:09.793592258 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Data Transfer 
Element 0:Empty
Fri Oct 09 21:57:09.793645704 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx: Data Transfer 
Element 1:Full (Storage Element 13 Loaded):VolumeTag = 01L7
Fri Oct 09 21:57:09.793696913 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 1:Empty
Fri Oct 09 21:57:09.793740270 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 2:Empty
Fri Oct 09 21:57:09.793779177 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 3:Full :VolumeTag=15L7
Fri Oct 09 21:57:09.793822925 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 4:Full :VolumeTag=16L7
Fri Oct 09 21:57:09.793860922 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 5:Full :VolumeTag=17L7
Fri Oct 09 21:57:09.793898857 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 6:Full :VolumeTag=14L7
Fri Oct 09 21:57:09.793932089 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 7:Empty
Fri Oct 09 21:57:09.793961884 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 8:Empty
Fri Oct 09 21:57:09.794026494 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 9:Full :VolumeTag=21L7
Fri Oct 09 21:57:09.794067771 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 10:Full :VolumeTag=18L7
Fri Oct 09 21:57:09.794102988 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 11:Empty
Fri Oct 09 21:57:09.794132586 2020: pid 24239: thd-0xc2e600: amcheck-device: mtx:   Storage 
Element 12:Empty
Fri Oct 09 21:57:09.7

Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Chris Hoogendyk

Also

8 lines like this in the kern.log:

   Oct  9 21:05:58 eclogite kernel: [650814.343315] amcheck-device[5089]: 
segfault at 0 ip
   7f94159617c6 sp 7fff61039da8 error 4 in 
libc-2.23.so[7f94158d6000+1c]


On 10/10/20 12:29 AM, Nathan Stratton Treadway wrote:

On Fri, Oct 09, 2020 at 22:22:47 -0400, Chris Hoogendyk wrote:

both servers, I'm getting this error (ERROR: amcheck-device
terminated with signal 11). When I ran the amcheck before swapping

amcheck-device is a Perl program, so it's a little bit impressive to be
triggering a SEGV of the process :( .

Do you get any coredump-related kernel messages in your syslog file when
the process crashes?



Fri Oct 09 21:57:09.795469021 2020: pid 24239: thd-0xc2e600: amcheck-device: 
NEO200x48: updating state
Fri Oct 09 21:57:09.802340532 2020: pid 24239: thd-0xc2e600:
amcheck-device: warning: Use of uninitialized value in numeric eq
(==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
line 102.
Fri Oct 09 21:57:09.802537511 2020: pid 24239: thd-0xc2e600:
amcheck-device: warning: Use of uninitialized value in numeric eq
(==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
line 102.
Fri Oct 09 21:57:09.802622523 2020: pid 24239: thd-0xc2e600:
amcheck-device: warning: Use of uninitialized value in numeric eq
(==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
line 102.

I am guessing this "uninitialized value" warning is not directly causing
the crash, but those log message might possibly hint as to where in the
program execution had reached just prior to the crash.

Can you post the output of

   $ sed -n '99,$p;105q' 
/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
?

(In other words, what's line 102 in that file on your system, with a
few lines of context?)


Do you see those warning lines in the log files from a successfully
amcheck run (i.e. from a few days ago)?  What do those logs show after
the warning lines (or the "updating state" line)?


Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


--
---

Chris Hoogendyk

-
   O__   Systems Administrator, Retired
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst



---

Erdös 4



Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-10 Thread Chris Hoogendyk

Sorry for the delay in responding.

   amanda@marlin:~/daily$ cat -n 
/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm | grep "^
   *102" -C3
    99            $sl->{'state'} == Amanda::Changer::SLOT_UNKNOWN) {
   100            push @unknown, $sl;
   101        } elsif ($sl->{'state'} == Amanda::Changer::SLOT_EMPTY) {
   102        } elsif (defined $sl->{'label'} &&
   103             $sl->{device_status} == $DEVICE_STATUS_SUCCESS) {
   104            my $vol_tle = 
$self->{'tapelist'}->lookup_tapelabel($sl->{'label'});
   105            if ($self->is_reusable_volume(label => $sl->{'label'})) {
   amanda@marlin:~/daily$


On 10/10/20 12:46 AM, Nathan Stratton Treadway wrote:

On Sat, Oct 10, 2020 at 00:29:04 -0400, Nathan Stratton Treadway wrote:

Can you post the output of

   $ sed -n '99,$p;105q' 
/usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
?

(In other words, what's line 102 in that file on your system, with a
few lines of context?)

(Hmm, perhaps

   $ cat -n /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm | grep "^ 
*102" -C3

would be better -- that way, the file line numbers are included in
output...)



Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


--
---

Chris Hoogendyk

-
   O__   Systems Administrator, Retired
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center III
~~ - University of Massachusetts, Amherst



---

Erdös 4



Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-09 Thread Nathan Stratton Treadway
On Sat, Oct 10, 2020 at 00:29:04 -0400, Nathan Stratton Treadway wrote:
> Can you post the output of 
> 
>   $ sed -n '99,$p;105q' 
> /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
> ?
> 
> (In other words, what's line 102 in that file on your system, with a
> few lines of context?)

(Hmm, perhaps 

  $ cat -n /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm | grep "^ 
*102" -C3

would be better -- that way, the file line numbers are included in
output...)



Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Re: Amanda 3.5.1 - "ERROR: amcheck-device terminated with signal 11"

2020-10-09 Thread Nathan Stratton Treadway
On Fri, Oct 09, 2020 at 22:22:47 -0400, Chris Hoogendyk wrote:
> both servers, I'm getting this error (ERROR: amcheck-device
> terminated with signal 11). When I ran the amcheck before swapping

amcheck-device is a Perl program, so it's a little bit impressive to be
triggering a SEGV of the process :( .

Do you get any coredump-related kernel messages in your syslog file when
the process crashes?


> Fri Oct 09 21:57:09.795469021 2020: pid 24239: thd-0xc2e600: amcheck-device: 
> NEO200x48: updating state
> Fri Oct 09 21:57:09.802340532 2020: pid 24239: thd-0xc2e600:
> amcheck-device: warning: Use of uninitialized value in numeric eq
> (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
> line 102.
> Fri Oct 09 21:57:09.802537511 2020: pid 24239: thd-0xc2e600:
> amcheck-device: warning: Use of uninitialized value in numeric eq
> (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
> line 102.
> Fri Oct 09 21:57:09.802622523 2020: pid 24239: thd-0xc2e600:
> amcheck-device: warning: Use of uninitialized value in numeric eq
> (==) at /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
> line 102.

I am guessing this "uninitialized value" warning is not directly causing
the crash, but those log message might possibly hint as to where in the
program execution had reached just prior to the crash.

Can you post the output of 

  $ sed -n '99,$p;105q' /usr/local/share/perl/5.22.1/Amanda/Taper/Scan/oldest.pm
?

(In other words, what's line 102 in that file on your system, with a
few lines of context?)


Do you see those warning lines in the log files from a successfully
amcheck run (i.e. from a few days ago)?  What do those logs show after
the warning lines (or the "updating state" line)?


Nathan


Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239