Re: Transient data-path errors when running amcheck

2014-01-10 Thread Jean-Louis Martineau

Steven,

Let me know if the patch works or not.

Jean-Louis

On 01/06/2014 01:13 PM, Jean-Louis Martineau wrote:
Try this patch, chg-robot retry the mtx command if it failed with 'No 
Sense'.


Jean-Louis

On 01/06/2014 01:04 PM, Steven Backus wrote:

The complete output of mtx is not logged, run it on the command line to
get the complete output:
 /usr/sbin/mtx -f /dev/changer nobarcode status

   Storage Changer /dev/sg6:1 Drives, 7 Slots ( 0 Import/Export )
Data Transfer Element 0:Full (Storage Element 1 Loaded)
   Storage Element 1:Empty
   Storage Element 2:Full
   Storage Element 3:Full
   Storage Element 4:Full
   Storage Element 5:Full
   Storage Element 6:Full
   Storage Element 7:Full



'it always works the 2nd time', Did it always works the 3rd and 4th
time?

Yes, after the first fail, it always works.  Then several days
later I try switching tapes and it fails again the first time and
always works again after that.


When is the first time?

The first time is Monday morning after I've done a dump to the
holding disk.  I do 6 amdumps during the week:

M degraded
T flush to tape
W degraded
Th degraded
F flush to tape
S degraded

Steve






Re: Transient data-path errors when running amcheck

2014-01-10 Thread Jean-Louis Martineau

Patch 1  2 are already committed.
I committed the third path.

Thanks for reporting the bug and testing the patch.

Jean-Louis

On 01/10/2014 11:43 AM, Steven Backus wrote:

Jean-Louis writes:

Let me know if the patch works or not.

There were 3 patches:

1. Transient data-path errors when running amcheck
2. amtape-shows-slot-empty-but-mtx-doesn-t
3. chg-robot retry the mtx command if it failed with 'No Sense'
--

All the patches worked great, I'm all fixed up now.  Thanks
Jean-Louis.

Steve




Re: Transient data-path errors when running amcheck

2014-01-10 Thread Steven Backus
Jean-Louis writes:
 Let me know if the patch works or not.

There were 3 patches:

1. Transient data-path errors when running amcheck
2. amtape-shows-slot-empty-but-mtx-doesn-t
3. chg-robot retry the mtx command if it failed with 'No Sense'
--

All the patches worked great, I'm all fixed up now.  Thanks
Jean-Louis.

Steve
-- 
Steven J. BackusComputer Systems Manager
University of Utah  E-Mail:  steven.bac...@utah.edu
Genetic EpidemiologyAlternate:  bac...@math.utah.edu
391 Chipeta Way -- Suite D  Office:  801.587.9308
Salt Lake City, UT 84108-1266   http://www.math.utah.edu/~backus


Re: Transient data-path errors when running amcheck

2014-01-06 Thread Steven Backus
 Amanda can't fix bug in the scsi hardware/firmware.

It's possible there's a scsi bug but since it always works the
2nd time hopefully more of a timing issue.

 Post the amtape debug file when it fail.

amtape.20140106091626.debug:

Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 ruid 0 euid 0 
version 3.3.5: start at Mon Jan  6 09:16:26 2014
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: Arguments: gen slot 1
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 ruid 0 euid 0 
version 3.3.5: rename at Mon Jan  6 09:16:26 2014
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: chg-robot: using statefile 
'/local/etc/amanda/gen/changer-state'
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: invoking /usr/sbin/mtx -f 
/dev/changer nobarcode status
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: new Amanda::Changer::Error: 
type='fatal', message='error from mtx: SCSI error; Sense Key=No Sense'
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 finish time Mon Jan  
6 09:16:26 2014

Steve
-- 
Steven J. BackusComputer Systems Manager
University of Utah  E-Mail:  steven.bac...@utah.edu
Genetic EpidemiologyAlternate:  bac...@math.utah.edu
391 Chipeta Way -- Suite D  Office:  801.587.9308
Salt Lake City, UT 84108-1266   http://www.math.utah.edu/~backus


Re: Transient data-path errors when running amcheck

2014-01-06 Thread Jean-Louis Martineau

Steven,

The complete output of mtx is not logged, run it on the command line to 
get the complete output:

   /usr/sbin/mtx -f /dev/changer nobarcode status

'it always works the 2nd time', Did it always works the 3rd and 4th 
time? When is the first time?


 Jean-Louis

On 01/06/2014 12:27 PM, Steven Backus wrote:

Amanda can't fix bug in the scsi hardware/firmware.

It's possible there's a scsi bug but since it always works the
2nd time hopefully more of a timing issue.


Post the amtape debug file when it fail.

amtape.20140106091626.debug:

Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 ruid 0 euid 0 
version 3.3.5: start at Mon Jan  6 09:16:26 2014
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: Arguments: gen slot 1
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 ruid 0 euid 0 
version 3.3.5: rename at Mon Jan  6 09:16:26 2014
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: chg-robot: using statefile 
'/local/etc/amanda/gen/changer-state'
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: invoking /usr/sbin/mtx -f 
/dev/changer nobarcode status
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: new Amanda::Changer::Error: 
type='fatal', message='error from mtx: SCSI error; Sense Key=No Sense'
Mon Jan  6 09:16:26 2014: thd-0xe1ff960: amtape: pid 12109 finish time Mon Jan  
6 09:16:26 2014

Steve




Re: Transient data-path errors when running amcheck

2014-01-06 Thread Steven Backus
 The complete output of mtx is not logged, run it on the command line to 
 get the complete output:
 /usr/sbin/mtx -f /dev/changer nobarcode status

  Storage Changer /dev/sg6:1 Drives, 7 Slots ( 0 Import/Export )
Data Transfer Element 0:Full (Storage Element 1 Loaded)
  Storage Element 1:Empty
  Storage Element 2:Full 
  Storage Element 3:Full 
  Storage Element 4:Full 
  Storage Element 5:Full 
  Storage Element 6:Full 
  Storage Element 7:Full 


 'it always works the 2nd time', Did it always works the 3rd and 4th 
 time? 

Yes, after the first fail, it always works.  Then several days
later I try switching tapes and it fails again the first time and
always works again after that.

 When is the first time?

The first time is Monday morning after I've done a dump to the
holding disk.  I do 6 amdumps during the week:

M degraded
T flush to tape
W degraded
Th degraded
F flush to tape
S degraded

Steve
-- 
Steven J. BackusComputer Systems Manager
University of Utah  E-Mail:  steven.bac...@utah.edu
Genetic EpidemiologyAlternate:  bac...@math.utah.edu
391 Chipeta Way -- Suite D  Office:  801.587.9308
Salt Lake City, UT 84108-1266   http://www.math.utah.edu/~backus


Re: Transient data-path errors when running amcheck

2014-01-06 Thread Jean-Louis Martineau
Try this patch, chg-robot retry the mtx command if it failed with 'No 
Sense'.


Jean-Louis

On 01/06/2014 01:04 PM, Steven Backus wrote:

The complete output of mtx is not logged, run it on the command line to
get the complete output:
 /usr/sbin/mtx -f /dev/changer nobarcode status

   Storage Changer /dev/sg6:1 Drives, 7 Slots ( 0 Import/Export )
Data Transfer Element 0:Full (Storage Element 1 Loaded)
   Storage Element 1:Empty
   Storage Element 2:Full
   Storage Element 3:Full
   Storage Element 4:Full
   Storage Element 5:Full
   Storage Element 6:Full
   Storage Element 7:Full



'it always works the 2nd time', Did it always works the 3rd and 4th
time?

Yes, after the first fail, it always works.  Then several days
later I try switching tapes and it fails again the first time and
always works again after that.


When is the first time?

The first time is Monday morning after I've done a dump to the
holding disk.  I do 6 amdumps during the week:

M degraded
T flush to tape
W degraded
Th degraded
F flush to tape
S degraded

Steve


diff --git a/perl/Amanda/Changer/robot.pm b/perl/Amanda/Changer/robot.pm
index 9dbf8e1..0cafec0 100644
--- a/perl/Amanda/Changer/robot.pm
+++ b/perl/Amanda/Changer/robot.pm
@@ -2266,12 +2266,15 @@ sub status {
 	my ($exitstatus, $output) = @_;
 	if ($exitstatus != 0) {
 		my $err = $output;
+		for my $line (split '\n', $output) {
+		debug(mtx: $line);
+		}
 		# if it's a regular SCSI error, just show the sense key
 		my ($sensekey) = ($err =~ /mtx: Request Sense: Sense Key=(.*)\n/);
 		$err = SCSI error; Sense Key=$sensekey if $sensekey;
 		$counter--;
-		if ($sensekey eq Not Ready and $counter  0) {
-		debug($output);
+		if (($sensekey eq Not Ready and $counter  0) ||
+		($sensekey eq No Sense and $counter  0)) {
 		return Amanda::MainLoop::call_after(1000, $run_mtx);
 		}
 		return $status_cb-(error from mtx:  . $err, {});


Re: Transient data-path errors when running amcheck

2014-01-03 Thread Debra S Baddorf
Huh!  I had a couple of instances of it's empty / no it isn't and deleted 
my changerfile. This fixed it, but perhaps I should also try your patch. I was 
sharing the drive and the changerfile between 2 configurations (a daily and an 
archive) and thought I might have caused my problems by doing that.  I'll look 
into the patch in a few weeks when I'm back in the office.

Deb Baddorf

 On Jan 3, 2014, at 2:47 PM, Jean-Louis Martineau martin...@zmanda.com 
 wrote:
 
 Steven,
 
 It is always better to report a bug instead of going to an older release.
 
 Try the patch I posted on 
 https://forums.zmanda.com/showthread.php?5179-amtape-shows-slot-empty-but-mtx-doesn-t
 
 Jean-Louis
 
 On 01/03/2014 03:29 PM, Steven Backus wrote:
 As directed, I tried changing from chg-zd-mtx to chg-robot.  Here's
 my changer definition:
 
 define changer gen-drive-0 {
   tpchanger chg-robot:/dev/changer
   changerfile /local/etc/amanda/gen/changer-state
   property tape-device 0=tape:/dev/gentape
   property eject-before-unload no
   property use-slots 1-7
   property fast-search no
   property ignore-barcodes yes
   property eject-delay 10
   property unload-delay 10
   property load-poll 0 s poll 5 s until 120 s
 }
 
 now after the last amdump, current slot was set to 4.  The next
 time I ran amcheck, for some reason it decided to use slot 6 for
 the next tape instead of #5 and amtape reports:
 
 amtape gen current
 
 slot   6: time 20130823170001 label gen072
 
 so I try to switch it to 5 with amtape:
 
 amtape gen slot 5
 
 ERROR: Slot: 5: slot 5 is empty
 
 But that's not the case, as mtx status shows:
 
   Storage Changer /dev/changer:1 Drives, 7 Slots ( 0 Import/Export )
 Data Transfer Element 0:Full (Storage Element 6 Loaded)
   Storage Element 1:Full
   Storage Element 2:Full
   Storage Element 3:Full
   Storage Element 4:Full
   Storage Element 5:Full
   Storage Element 6:Empty
   Storage Element 7:Full
 
 So I do an mtx unload:
 
 Unloading drive 0 into Storage Element 6...done
 
 but amtape still refuses to co-operate:
 
 amtape gen slot 5
 
 ERROR: Slot: 5: slot 5 is empty
 
 So I do:
 
 mtx load 5
 Loading media from Storage Element 5 into drive 0...
 
 Finally amtape agrees with me:
 
 amtape gen slot 5
 slot   5: time 20130819170001 label gen071
 changed to slot 5
 ---
   Reasons like this are why I went back to chg-zd-mtx.  Although
 it still messes up, at least it doesn't skip tapes randomly.
 
 Steve
 



Re: Transient data-path errors when running amcheck

2013-12-30 Thread Jean-Louis Martineau

On 12/30/2013 10:46 AM, Steven Backus wrote:

The problem I'm having with both chg-zd-mtx and chg-robot is the
drive doesn't become ready in time.  The first amcheck alwayws
fails but the 2nd one succeeds.  I've tried various delay settings
to both without success.  It' an Ultrium robot, any ideas on how I
can avoid this problem?


There is a lot of properties that can be set with chg-robot:
   EJECT-BEFORE-UNLOAD
   EJECT-DELAY
   LOAD-POLL
   UNLOAD-DELAY

See the amanda-changers man page.

Increase all delay to insane large value, provide moe information if it 
still fail.


Jean-Louis


Transient data-path errors when running amcheck

2013-12-27 Thread Steven Backus
Since upgrading to 3.3.5 I occasionally get errors of this nature:

ERROR: whimsy.med.utah.edu sdc1: data-path is AMANDA but device do not support 
it

upon re-running amcheck the errors go away.  When they do appear,
there are errors of this type generated for each entry in my
disklist.  Any ideas?

Thanks,
  Steve
-- 
Steven J. BackusComputer Systems Manager
University of Utah  E-Mail:  steven.bac...@utah.edu
Genetic EpidemiologyAlternate:  bac...@math.utah.edu
391 Chipeta Way -- Suite D  Office:  801.587.9308
Salt Lake City, UT 84108-1266   http://www.math.utah.edu/~backus