Re: Transient data-path errors when running amcheck
Steven, Let me know if the patch works or not. Jean-Louis On 01/06/2014 01:13 PM, Jean-Louis Martineau wrote: Try this patch, chg-robot retry the mtx command if it failed with 'No Sense'. Jean-Louis On 01/06/2014 01:04 PM, Steven Backus wrote: The complete output of mtx is not logged, run it on the command line to get the complete output: /usr/sbin/mtx -f /dev/changer nobarcode status Storage Changer /dev/sg6:1 Drives, 7 Slots ( 0 Import/Export ) Data Transfer Element 0:Full (Storage Element 1 Loaded) Storage Element 1:Empty Storage Element 2:Full Storage Element 3:Full Storage Element 4:Full Storage Element 5:Full Storage Element 6:Full Storage Element 7:Full 'it always works the 2nd time', Did it always works the 3rd and 4th time? Yes, after the first fail, it always works. Then several days later I try switching tapes and it fails again the first time and always works again after that. When is the first time? The first time is Monday morning after I've done a dump to the holding disk. I do 6 amdumps during the week: M degraded T flush to tape W degraded Th degraded F flush to tape S degraded Steve
Re: Transient data-path errors when running amcheck
Patch 1 2 are already committed. I committed the third path. Thanks for reporting the bug and testing the patch. Jean-Louis On 01/10/2014 11:43 AM, Steven Backus wrote: Jean-Louis writes: Let me know if the patch works or not. There were 3 patches: 1. Transient data-path errors when running amcheck 2. amtape-shows-slot-empty-but-mtx-doesn-t 3. chg-robot retry the mtx command if it failed with 'No Sense' -- All the patches worked great, I'm all fixed up now. Thanks Jean-Louis. Steve
Re: Transient data-path errors when running amcheck
Jean-Louis writes: Let me know if the patch works or not. There were 3 patches: 1. Transient data-path errors when running amcheck 2. amtape-shows-slot-empty-but-mtx-doesn-t 3. chg-robot retry the mtx command if it failed with 'No Sense' -- All the patches worked great, I'm all fixed up now. Thanks Jean-Louis. Steve -- Steven J. BackusComputer Systems Manager University of Utah E-Mail: steven.bac...@utah.edu Genetic EpidemiologyAlternate: bac...@math.utah.edu 391 Chipeta Way -- Suite D Office: 801.587.9308 Salt Lake City, UT 84108-1266 http://www.math.utah.edu/~backus
Re: Transient data-path errors when running amcheck
Amanda can't fix bug in the scsi hardware/firmware. It's possible there's a scsi bug but since it always works the 2nd time hopefully more of a timing issue. Post the amtape debug file when it fail. amtape.20140106091626.debug: Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 ruid 0 euid 0 version 3.3.5: start at Mon Jan 6 09:16:26 2014 Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: Arguments: gen slot 1 Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 ruid 0 euid 0 version 3.3.5: rename at Mon Jan 6 09:16:26 2014 Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: chg-robot: using statefile '/local/etc/amanda/gen/changer-state' Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: invoking /usr/sbin/mtx -f /dev/changer nobarcode status Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: new Amanda::Changer::Error: type='fatal', message='error from mtx: SCSI error; Sense Key=No Sense' Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 finish time Mon Jan 6 09:16:26 2014 Steve -- Steven J. BackusComputer Systems Manager University of Utah E-Mail: steven.bac...@utah.edu Genetic EpidemiologyAlternate: bac...@math.utah.edu 391 Chipeta Way -- Suite D Office: 801.587.9308 Salt Lake City, UT 84108-1266 http://www.math.utah.edu/~backus
Re: Transient data-path errors when running amcheck
Steven, The complete output of mtx is not logged, run it on the command line to get the complete output: /usr/sbin/mtx -f /dev/changer nobarcode status 'it always works the 2nd time', Did it always works the 3rd and 4th time? When is the first time? Jean-Louis On 01/06/2014 12:27 PM, Steven Backus wrote: Amanda can't fix bug in the scsi hardware/firmware. It's possible there's a scsi bug but since it always works the 2nd time hopefully more of a timing issue. Post the amtape debug file when it fail. amtape.20140106091626.debug: Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 ruid 0 euid 0 version 3.3.5: start at Mon Jan 6 09:16:26 2014 Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: Arguments: gen slot 1 Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 ruid 0 euid 0 version 3.3.5: rename at Mon Jan 6 09:16:26 2014 Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: chg-robot: using statefile '/local/etc/amanda/gen/changer-state' Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: invoking /usr/sbin/mtx -f /dev/changer nobarcode status Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: new Amanda::Changer::Error: type='fatal', message='error from mtx: SCSI error; Sense Key=No Sense' Mon Jan 6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 finish time Mon Jan 6 09:16:26 2014 Steve
Re: Transient data-path errors when running amcheck
The complete output of mtx is not logged, run it on the command line to get the complete output: /usr/sbin/mtx -f /dev/changer nobarcode status Storage Changer /dev/sg6:1 Drives, 7 Slots ( 0 Import/Export ) Data Transfer Element 0:Full (Storage Element 1 Loaded) Storage Element 1:Empty Storage Element 2:Full Storage Element 3:Full Storage Element 4:Full Storage Element 5:Full Storage Element 6:Full Storage Element 7:Full 'it always works the 2nd time', Did it always works the 3rd and 4th time? Yes, after the first fail, it always works. Then several days later I try switching tapes and it fails again the first time and always works again after that. When is the first time? The first time is Monday morning after I've done a dump to the holding disk. I do 6 amdumps during the week: M degraded T flush to tape W degraded Th degraded F flush to tape S degraded Steve -- Steven J. BackusComputer Systems Manager University of Utah E-Mail: steven.bac...@utah.edu Genetic EpidemiologyAlternate: bac...@math.utah.edu 391 Chipeta Way -- Suite D Office: 801.587.9308 Salt Lake City, UT 84108-1266 http://www.math.utah.edu/~backus
Re: Transient data-path errors when running amcheck
Try this patch, chg-robot retry the mtx command if it failed with 'No Sense'. Jean-Louis On 01/06/2014 01:04 PM, Steven Backus wrote: The complete output of mtx is not logged, run it on the command line to get the complete output: /usr/sbin/mtx -f /dev/changer nobarcode status Storage Changer /dev/sg6:1 Drives, 7 Slots ( 0 Import/Export ) Data Transfer Element 0:Full (Storage Element 1 Loaded) Storage Element 1:Empty Storage Element 2:Full Storage Element 3:Full Storage Element 4:Full Storage Element 5:Full Storage Element 6:Full Storage Element 7:Full 'it always works the 2nd time', Did it always works the 3rd and 4th time? Yes, after the first fail, it always works. Then several days later I try switching tapes and it fails again the first time and always works again after that. When is the first time? The first time is Monday morning after I've done a dump to the holding disk. I do 6 amdumps during the week: M degraded T flush to tape W degraded Th degraded F flush to tape S degraded Steve diff --git a/perl/Amanda/Changer/robot.pm b/perl/Amanda/Changer/robot.pm index 9dbf8e1..0cafec0 100644 --- a/perl/Amanda/Changer/robot.pm +++ b/perl/Amanda/Changer/robot.pm @@ -2266,12 +2266,15 @@ sub status { my ($exitstatus, $output) = @_; if ($exitstatus != 0) { my $err = $output; + for my $line (split '\n', $output) { + debug(mtx: $line); + } # if it's a regular SCSI error, just show the sense key my ($sensekey) = ($err =~ /mtx: Request Sense: Sense Key=(.*)\n/); $err = SCSI error; Sense Key=$sensekey if $sensekey; $counter--; - if ($sensekey eq Not Ready and $counter 0) { - debug($output); + if (($sensekey eq Not Ready and $counter 0) || + ($sensekey eq No Sense and $counter 0)) { return Amanda::MainLoop::call_after(1000, $run_mtx); } return $status_cb-(error from mtx: . $err, {});
Re: Transient data-path errors when running amcheck
Huh! I had a couple of instances of it's empty / no it isn't and deleted my changerfile. This fixed it, but perhaps I should also try your patch. I was sharing the drive and the changerfile between 2 configurations (a daily and an archive) and thought I might have caused my problems by doing that. I'll look into the patch in a few weeks when I'm back in the office. Deb Baddorf On Jan 3, 2014, at 2:47 PM, Jean-Louis Martineau martin...@zmanda.com wrote: Steven, It is always better to report a bug instead of going to an older release. Try the patch I posted on https://forums.zmanda.com/showthread.php?5179-amtape-shows-slot-empty-but-mtx-doesn-t Jean-Louis On 01/03/2014 03:29 PM, Steven Backus wrote: As directed, I tried changing from chg-zd-mtx to chg-robot. Here's my changer definition: define changer gen-drive-0 { tpchanger chg-robot:/dev/changer changerfile /local/etc/amanda/gen/changer-state property tape-device 0=tape:/dev/gentape property eject-before-unload no property use-slots 1-7 property fast-search no property ignore-barcodes yes property eject-delay 10 property unload-delay 10 property load-poll 0 s poll 5 s until 120 s } now after the last amdump, current slot was set to 4. The next time I ran amcheck, for some reason it decided to use slot 6 for the next tape instead of #5 and amtape reports: amtape gen current slot 6: time 20130823170001 label gen072 so I try to switch it to 5 with amtape: amtape gen slot 5 ERROR: Slot: 5: slot 5 is empty But that's not the case, as mtx status shows: Storage Changer /dev/changer:1 Drives, 7 Slots ( 0 Import/Export ) Data Transfer Element 0:Full (Storage Element 6 Loaded) Storage Element 1:Full Storage Element 2:Full Storage Element 3:Full Storage Element 4:Full Storage Element 5:Full Storage Element 6:Empty Storage Element 7:Full So I do an mtx unload: Unloading drive 0 into Storage Element 6...done but amtape still refuses to co-operate: amtape gen slot 5 ERROR: Slot: 5: slot 5 is empty So I do: mtx load 5 Loading media from Storage Element 5 into drive 0... Finally amtape agrees with me: amtape gen slot 5 slot 5: time 20130819170001 label gen071 changed to slot 5 --- Reasons like this are why I went back to chg-zd-mtx. Although it still messes up, at least it doesn't skip tapes randomly. Steve
Re: Transient data-path errors when running amcheck
On 12/30/2013 10:46 AM, Steven Backus wrote: The problem I'm having with both chg-zd-mtx and chg-robot is the drive doesn't become ready in time. The first amcheck alwayws fails but the 2nd one succeeds. I've tried various delay settings to both without success. It' an Ultrium robot, any ideas on how I can avoid this problem? There is a lot of properties that can be set with chg-robot: EJECT-BEFORE-UNLOAD EJECT-DELAY LOAD-POLL UNLOAD-DELAY See the amanda-changers man page. Increase all delay to insane large value, provide moe information if it still fail. Jean-Louis
Transient data-path errors when running amcheck
Since upgrading to 3.3.5 I occasionally get errors of this nature: ERROR: whimsy.med.utah.edu sdc1: data-path is AMANDA but device do not support it upon re-running amcheck the errors go away. When they do appear, there are errors of this type generated for each entry in my disklist. Any ideas? Thanks, Steve -- Steven J. BackusComputer Systems Manager University of Utah E-Mail: steven.bac...@utah.edu Genetic EpidemiologyAlternate: bac...@math.utah.edu 391 Chipeta Way -- Suite D Office: 801.587.9308 Salt Lake City, UT 84108-1266 http://www.math.utah.edu/~backus