Re: [RFC] implementing tape statistics single file vs multi-file in sysfs

2015-02-11 Thread Bryn M. Reeves
On Wed, Feb 11, 2015 at 06:30:27AM +0800, Greg KH wrote:
> On Tue, Feb 10, 2015 at 02:27:20PM +0000, Bryn M. Reeves wrote:
> > On Sat, Feb 07, 2015 at 12:07:43PM +0800, Greg KH wrote:
> > 
> > $ cat /sys/fs/selinux/avc/cache_stats 
> > lookups hits misses allocations reclaims frees
> > 18938916 18921707 17209 17209 17328 22215
> > 38164283 38146514 17769 17769 16800 19049
> > 18078108 18056991 21117 21117 21344 19305
> > 15168204 15150079 18125 18125 17776 13149
> > 0 0 0 0 0 0
> > 0 0 0 0 0 0
> > 0 0 0 0 0 0
> > 0 0 0 0 0 0
> > 
> > $ cat /sys/fs/selinux/avc/hash_stats
> > entries: 506
> > buckets used: 290/512
> > longest chain: 5
> 
> Ugh, those look like they should be debugfs interfaces.  Thanks, I'll
> add them to my list of things to nag people about...

Actually looking properly these are outside sysfs:

$ mount | grep selinux
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)

So everything below the mount point is in their own vfstype.

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] implementing tape statistics single file vs multi-file in sysfs

2015-02-10 Thread Bryn M. Reeves
On Sat, Feb 07, 2015 at 12:07:43PM +0800, Greg KH wrote:
> On Fri, Feb 06, 2015 at 03:41:58PM +0000, Bryn M. Reeves wrote:
> > I can't speak for Shane but wouldn't spend too much time looking at the
> > current v2 patch: it's the result of a pretty ugly compromise suggested
> > on linux-scsi.
> 
> Fair enough, but please feel free to cc: me on the patch that you do
> feel is correct to get a sysfs-related review.

Will do; I'm back from travels this week & will have some time to look at
this.
 
> > Likewise for disk stats: although fluff like maj:min/name etc. has been
> > shuffled a few times the basic fields have remained unchanged for a very
> > long time and sysfs already removes the need to include an identity
> > field.
> 
> We already handle i/o stats just fine, why create a special sysfs
> interface for just a tape device interface?  What makes them so special?

But the iostats use exactly the sort of array file we're talking about:

$ cat /sys/block/sda/stat 
  12764420869  4320505  2305697   15404530056  3834036  9065092
0   931842 11371357

And we can't simply extend these to tapes as they are not block devices.
 
> > I understand the fact that you can't change them; I just don't think it's
> > a big problem in this specific case (and much less than some of the
> > more imaginative sysfs content - 2d int arrays with column headers
> > anyone?).
> 
> What sysfs file is a 2d int array?  I'll be glad to fix it.

$ cat /sys/fs/selinux/avc/cache_stats 
lookups hits misses allocations reclaims frees
18938916 18921707 17209 17209 17328 22215
38164283 38146514 17769 17769 16800 19049
18078108 18056991 21117 21117 21344 19305
15168204 15150079 18125 18125 17776 13149
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

$ cat /sys/fs/selinux/avc/hash_stats
entries: 506
buckets used: 290/512
longest chain: 5

> If you want to measure tens of thousands of tape devices then you
> shouldn't be using sysfs in the first place as it is not designed for
> "speed" at all.  Use the existing i/o rate interfaces instead, don't try
> to cram something into sysfs that doesn't belong there.

So far as I'm aware there is no other way to obtain performance data
for the SCSI tape subsystem (without resorting to ftrace/systemtap).

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] implementing tape statistics single file vs multi-file in sysfs

2015-02-06 Thread Bryn M. Reeves
On Fri, Feb 06, 2015 at 04:59:16AM -0800, Greg KH wrote:
> On Fri, Feb 06, 2015 at 12:20:53AM +, Seymour, Shane M wrote:
> > The current patch that implements tape statistics is here:
> > 
> > http://marc.info/?l=linux-scsi&m=142112067313723&w=2
> 
> Aside from the "do we want to do this all in a single file" issue that I
> will say more on below, this patch has issues.  Please don't use a
> kobject for _ANYTHING_ in sysfs that has a struct device as a parent.
> If you do that, it can't be seen by userspace tools very well, if at
> all.

I can't speak for Shane but wouldn't spend too much time looking at the
current v2 patch: it's the result of a pretty ugly compromise suggested
on linux-scsi.

This thread was really to try to settle the discussion on the structure
of the stats files.

> > Recently there was was another discussion here about one file vs a 
> > collection of files for tape statistics:
> > 
> > http://marc.info/?l=linux-scsi&m=142316255501550&w=2
> > 
> > The result was that I should ask here what method I should use. I
> > would like to get feedback in relation to tape statistics and one file
> > vs multi-file in sysfs. I'm happy to keep the existing code or change
> > to a single file approach.
> 
> One of the primary reasons we created sysfs and the "one value per file"
> rule is that multi-value files just do not work well.  Yes, you get an
> atomic snapshot, and you save some open/read/close syscall roundtrips,
> but you do so at the expense of forcing userspace to "know" what the
> format of the file is.  And once you create it, you can NEVER CHANGE IT
> AGAIN.

I am not convinced this is a concern for tape statistics: they are pretty
much a solved problem. The commercial *nixes have had this for decades.

Likewise for disk stats: although fluff like maj:min/name etc. has been
shuffled a few times the basic fields have remained unchanged for a very
long time and sysfs already removes the need to include an identity
field.
 
> Yes, that's right, if you come up with some new statistic in the future,
> or realize that one of the ones you have now is wrong, you can't change
> it, you have to make a whole new file, otherwise you could break
> userspace tools.

I understand the fact that you can't change them; I just don't think it's
a big problem in this specific case (and much less than some of the
more imaginative sysfs content - 2d int arrays with column headers
anyone?).

> And yes, open/read/close does take take a few extra cycles, but you
> can't really measure it for a virtual filesystem like this on any modern
> system.

I'll try to get some numbers when I get back home next week - Shane is
talking about use cases involving tens of thousands of tape devices. I
am not certain that the overhead would be unmeasurable in that case: the
additional context switching & TLB flushes alone seem like they would
add up.

> Hope this helps explain why we have the sysfs rule, and why you should
> continue to follow it as well.
>
> Yes, it's not always followed, but that's usually because people forgot
> why we had this rule, and no one noticed or pointed it out to me that it
> was wrong.

Perhaps sysfs.txt should be updated to make the position more clear? The
current wording seems rather more liberal than this thread would
suggest. Maybe something like the patch below?

This would help people who are trying to dtrt by reading the documentation.

Regards,
Bryn.


  From 3081aad4cc4d19b68f39499dbeb3837f0642f70e Mon Sep 17 00:00:00 2001
  From: "Bryn M. Reeves" 
  Date: Fri, 6 Feb 2015 15:19:39 +
  Subject: [PATCH] docs/sysfs: Specify array valued attribute review
   requirements
  
  Although the linux-api position that one-value-per-file is a strong rule
  is very clear in mailing list discussions the sysfs.txt documentation
  suggests a rather more liberal stance:
  
  "... it is socially acceptable to express an array of values of the same
  type."
  
  Fix the documentation to make it clear that such uses should be
  discussed on linux-api first.

Signed-off-by: Bryn M. Reeves 
---
 Documentation/filesystems/sysfs.txt | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/sysfs.txt 
b/Documentation/filesystems/sysfs.txt
index b35a64b..494fa78 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -57,8 +57,15 @@ attributes.
 
 Attributes should be ASCII text files, preferably with only one value
 per file. It is noted that it may not be efficient to contain only one
-value per file, so it is socially acceptable to express an array of
-values of the same type. 
+value per file, so it may be socially a

Re: [RFC] implementing tape statistics single file vs multi-file in sysfs

2015-02-06 Thread Bryn M. Reeves
On Fri, Feb 06, 2015 at 12:20:53AM +, Seymour, Shane M wrote:
> There has been some ongoing discussion about the best way to implement
> tape statistics. The original method suggested a long time ago used a
> single file in sysfs similar to block statistics in sysfs. That lead to
> an impass about the code on the linux-scsi mailing list.

I would have a strong preference for a single file containing an array
of integer counters. This is in keeping with other statistics attributes
in sysfs and follows the principle of least surprise: it's essentially
the same general format as /proc/diskstats and sysfs disk and partition
stats (dm statistics also follow this convention via the @print_stats
message).

This simplifies userspace code to read and parse the counters and avoids
additional sample jitter when reading stats for very large numbers of
devices; each device would require at least eleven open()/read()/close()
cycles.

For a small number of devices this shouldn't matter too much but
eventually the additional syscall overhead could become significant (I
think you mentioned users with ~20k devices?).

The sync file mechanism in the v2 patch that addresses this problem is
kinda cute but also significantly more complex than a plain old array
and as you pointed out adds hundreds of lines to the patch..

Sticking to arrays also allows existing tools like sysstat to be easily
adapted to the new data source.

> The sysfs documentation says that files should contain one item per file 
> (with some small exceptions):
> 
> > "Attributes should be ASCII text files, preferably with only one value
> > per file. It is noted that it may not be efficient to contain only one
> > value per file, so it is socially acceptable to express an array of
> > values of the same type."

Right: I think there's good precedent for the array file style when
dealing with counter sets.

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] st: implement sysfs based tape statistics v2

2015-02-05 Thread Bryn M. Reeves
On Thu, Feb 05, 2015 at 10:55:50AM -0800, James Bottomley wrote:
> OK, the sysfs bikeshedders hang out on linux-api
> 
> https://www.kernel.org/doc/man-pages/linux-api-ml.html
> 
> If you can convince them, we'll do the single file approach.

Will do - I've got a couple of stats projects on the go at the moment so
this ties in well with that.

I'll sync up with Shane and see if he's interested in running the int array
version via the linux-api folks.

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] st: implement sysfs based tape statistics v2

2015-02-05 Thread Bryn M. Reeves
On Thu, Feb 05, 2015 at 07:46:32PM +0200, "Kai Mäkisara (Kolumbus)" wrote:
> > On 5.2.2015, at 19.40, Laurence Oberman  wrote:
> > From: "Kai Mäkisara (Kolumbus)" 
> > I still think that the tape statistics should be exported like the 
> > statistics of “real” block devices, i.e., one sysfs file exporting on a 
> > single line the statistics that temporally belong together. James rejected 
> > this approach. I am leaving the decision about this code to him. I will 
> > neither ack nor nak this code.
> > 
> > I missed the earlier conversations with James, I will go search for them.
> > Do you mean add them so they are similar to the /proc/diskstats

http://comments.gmane.org/gmane.linux.scsi/80497

On Fri, Feb 22 2013 James Bottomley wrote:
 I'm afraid we can't do it the way you're proposing.  files in sysfs must
 conform to the one value per file rule (so we avoid the ABI nastiness
 that plagues /proc).  You can create a stat directory with a bunch of
 files, but not a single file that gives all values.

Documentation/filesystems/sysfs.txt does not agree:

  "Attributes should be ASCII text files, preferably with only one value
  per file. It is noted that it may not be efficient to contain only one
  value per file, so it is socially acceptable to express an array of
  values of the same type."

There's also ample precedent for this: sysfs disk and partition stats,
SELinux cache and hash table stats (which have a pretty yucky 2d int
array with column headers and a name: val format respectively).

There's also a bunch of multivariate name=value format stats files in
the cgroups sysfs tree.

> Not exactly. I mean the data exported in sysfs, for example:
> 
> > cat /sys/block/sda/sda1/stat
>   159740 9006  594150664461   12472455907 12772208  3598677   
>  0   299875  3663235

I'd prefer to consume tape stats in this format too; it follows the
principle of least surprise since it's shared with every other IO stats
source (including device-mapper statistics) and it simplifies handling
the counters in user space.

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dummy scsi read or scsi command periodically for external USB Hard Disk

2014-07-07 Thread Bryn M. Reeves
On Tue, Jul 08, 2014 at 12:15:54AM +0800, loody wrote:
> so sg_read will not hammer on the page cache like dd without "iflags=direct"
> 
> thanks for your kind help,

The sg_read program (and other programs in sg3_utils) sends a command directly
to the device using an SG_IO ioctl. This bypasses all the caching layers in
the kernel and always results in IO to the device (if it succeeds).

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dummy scsi read or scsi command periodically for external USB Hard Disk

2014-07-07 Thread Bryn M. Reeves
On Mon, Jul 07, 2014 at 11:39:05PM +0800, loody wrote:
> hi David:
> 
> 2014-07-07 23:06 GMT+08:00 David Laight :
> > From: Lars Melin
> > ...
> >> sgread is not included in BusyBox but you should have "touch".
> >> Create a dummy file on the disk and let cron touch it every 4 minutes.
> >
> > You don't need 'touch' a shell redirect eg ": >file" will do open(..., 
> > O_CREAT|O_TRUNC).
> > However that still might not force an actual disc access.
> >
> > In any case you really only want to do a read, doing a write will kill the 
> > NAND memory.
> 
> actually I have searched the scsi/usb layer for possible dummy read,
> even read sector 0 is fine, but in vain.
> 
> I found the read
> a. determined by VFS -> block layer,
> b. Block layer put it in queue
> c. call scsi pre-queue function to usb layer.
> 
> That mean if I try to read sector from usb devices, I have to create a
> queue and follow above b) and c) rule.
> is there any already kernel API I can use?
> 
> sincerely appreciate all yours help,

If you don't want to put sg_read into your image you could just use a dd;
busybox includes an implementation that should be good enough.

Just make sure you use the right flags to use O_DIRECT access or you'll
just end up hammering on the page cache. Iirc that's "iflags=direct" (check
the busybox docs to make sure it's the same).

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dummy scsi read or scsi command periodically for external USB Hard Disk

2014-07-07 Thread Bryn M. Reeves
On Sun, Jul 06, 2014 at 01:18:03AM +0800, loody wrote:
> hi all:
> we met a USB Hard Disk that will go to suspend if host stop
> sending scsi command over 5mins.
> To save the IO, kernel will keep the file in page cache as much as
> he can and under this circumstances, the scsi command may disappear
> for a while longer enough to cause the device suspend.
> 
> is there any kernel config or module parameter can do the dummy
> read or scsi command periodically?

No but you could set up a simple cron job to call an sg3_utils command.

E.g. issue an sg_read for one sector to the device every 4m:

  */4 * * * * sg_read count=1 if=/dev/

You'll probably want to disable mail notification for the job or have
it dropped or it'll get a bit noisy running that frequently.

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help decoding aacraid errors (3.10.40 kernel)

2014-06-27 Thread Bryn M. Reeves
On Fri, Jun 27, 2014 at 12:59:18PM +0200, Arkadiusz Miskiewicz wrote:
> Thanks for links. I wonder why kernel doesn't decode these to be actually 
> readable without a need for asking on ml - was decoding considered?

Normally it does; I was a bit surprised to see numbers printed with such
a recent kernel.

Sense key decoding to text has been around almost forever (the 'snstext'
table of sense strings pre-dates git, i.e. 2.6.12ish).

Is it possible your kernel was built without CONFIG_SCSI_CONSTANTS?

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: help decoding aacraid errors (3.10.40 kernel)

2014-06-27 Thread Bryn M. Reeves
On Fri, Jun 27, 2014 at 10:55:08AM +0200, Arkadiusz Miskiewicz wrote:
> [3757350.671860] Result: hostbyte=0x00 driverbyte=0x08
> [3757350.671862] sd 0:0:2:0: [sdc]  
> [3757350.671863] Sense Key : 0x4 [current] 

http://www.t10.org/lists/2sensekey.htm

0x4 is "hardware error".

> [3757350.671866] sd 0:0:2:0: [sdc]  
> [3757350.671868] ASC=0x44 ASCQ=0x0

http://www.t10.org/lists/asc-num.htm

0x44/0x00 is "internal target failure".

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: External USB3 disk fails with "Invalid field in cdb"

2014-06-27 Thread Bryn M. Reeves
On Thu, Jun 26, 2014 at 08:55:19PM +0200, Michael Büsch wrote:
> Jun 26 20:47:14 wiggum kernel: [156019.870310] sd 22:0:0:0: [sdb] 976773168 
> 512-byte logical blocks: (500 GB/465 GiB)
> Jun 26 20:47:14 wiggum kernel: [156019.870653] sd 22:0:0:0: [sdb] Write 
> Protect is off
> Jun 26 20:47:14 wiggum kernel: [156019.870659] sd 22:0:0:0: [sdb] Mode Sense: 
> 47 00 10 08

The disk says it supports FUA:

> Jun 26 20:47:14 wiggum kernel: [156019.870956] sd 22:0:0:0: [sdb] Write 
> cache: enabled, read cache: enabled, supports DPO and FUA
> Jun 26 20:47:14 wiggum kernel: [156019.924517]  sdb: sdb1
> Jun 26 20:47:14 wiggum kernel: [156019.928649] sd 22:0:0:0: [sdb] Attached 
> SCSI disk
> Jun 26 20:47:27 wiggum kernel: [156032.936896] JBD2: Clearing recovery 
> information on journal
> Jun 26 20:47:27 wiggum kernel: [156032.938218] sd 22:0:0:0: [sdb] Invalid 
> command failure
> Jun 26 20:47:27 wiggum kernel: [156032.938222] sd 22:0:0:0: [sdb]
> Jun 26 20:47:27 wiggum kernel: [156032.938225] Result: hostbyte=DID_OK 
> driverbyte=DRIVER_SENSE
> Jun 26 20:47:27 wiggum kernel: [156032.938228] sd 22:0:0:0: [sdb]

The disk doesn't like the command we sent it:

> Jun 26 20:47:27 wiggum kernel: [156032.938230] Sense Key : Illegal Request 
> [current]
> Jun 26 20:47:27 wiggum kernel: [156032.938234] sd 22:0:0:0: [sdb]
> Jun 26 20:47:27 wiggum kernel: [156032.938237] Add. Sense: Invalid field in 
> cdb

Looks like a WRITE(10) with the FUA bit set:

> Jun 26 20:47:27 wiggum kernel: [156032.938239] sd 22:0:0:0: [sdb] CDB:
> Jun 26 20:47:27 wiggum kernel: [156032.938241] Write(10): 2a 08 1d 04 00 3f 
> 00 00 08 00
>
> Does somebody have a hint to debug this?

I'd guess the device lies about supporting FUA. There seems to be
another report on the Debian lists of the same problem with a similar
JMicron enclosure:

  https://lists.debian.org/debian-user/2014/05/msg02066.html

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: debug_flag added to st tape driver

2014-06-10 Thread Bryn M. Reeves
On Tue, Jun 10, 2014 at 04:57:06PM -0400, Laurence Oberman wrote:
> I am tired of building modules to enable SCSI tape driver debug so I
> am hoping this patch is acceptable.
> Tested using kernel 3.14.6
> 
> Usage example:
> modprobe st debug_flag=1

Missing Signed-off-by :-)
 
> +module_param_named(debug_flag, debug_flag, int, 0);

It's probably not worth making this a sysfs knob as most distros
still compile st as a module although why not just set debugging
directly from the module parameter?

> +MODULE_PARM_DESC(debug_flag, "Enable DEBUG, same as setting DEBUG 1
> in source");
> +

The description is a bit misleading as a bunch of stuff gets
compiled out when DEBUG is unset at compile time.

Maybe "same as setting debugging=1" instead?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scsi_debug driver puzzle

2014-03-31 Thread Bryn M. Reeves
On 03/31/2014 06:32 PM, Laurence Oberman wrote:
>   File Line
> 0 scsi_debug.c 3551 int scsi_debug_queuecommand_lck(struct scsi_cmnd
> *SCpnt, done_funct_t done)
> 1 scsi_debug.c 3900 static DEF_SCSI_QCMD(scsi_debug_queuecommand)
> 2 scsi_debug.c 3912 .queuecommand =  scsi_debug_queuecommand,

Magical scsi_host.h macro as part of the effort to push host_lock down.
The macro creates a function named as its argument which takes and drops
the shost->host_lock around a call to the real queuecommand function:

spin_lock_irqsave(shost->host_lock, irq_flags);
scsi_cmd_get_serial(shost, cmd);
rc = func_name##_lck (cmd,cmd->scsi_done);
spin_unlock_irqrestore(shost->host_lock, irq_flags);

http://fpaste.org/90345/88875139/

Cheers,
Bryn.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Bryn M. Reeves

On 05/10/2013 03:24 PM, Hannes Reinecke wrote:

However, this time is only defined _on the initiator_.
The specification does _NOT_ have any fixed timeout values for _any_
command. As such it could in theory (and does, if you happen to run
against certain arrays under certain conditions) take several
minutes to return a completion.


That's my understanding too - in a multipath configuration we're 
waiting only for our own fast_io_fail_tmo (if set), which is essentially 
an arbitrary, administrator-controlled interval. You can tune it between 
extremes of rapid fault identification vs. paths twitching at every 
transient glitch.



Yes, that was the idea.
Which I'll get down to eventually; if only customers wouldn't have
all these obnoxious issues no-one has ever seen...


The class I've been looking at is really very easy to reproduce and 
we've seen it at least a half dozen times at different sites with 
different FC switches (so it's certainly not that unusual).


To recreate it artificially you just need a target, a host, and a switch 
that can block RSCN propagation on a per-port basis. I've been using 
brocades with the rscnsupr portcfg attribute.


It's important that you block a port on the switch<->target side 
otherwise the host will see a link event which short-circuits everything.


E.g. if you have one port of an array attached to port 1 on a brocade 
the following two commands will set up this scenario:


portcfg rscnsupr 1 --enable
portdisable 1

Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Bryn M. Reeves

On 05/10/2013 01:43 PM, Ewan Milne wrote:

On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:

Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several iterations (bus device, target, bus,
controller) before we give up.

Signed-off-by: Martin K. Petersen 



Thanks for posting this.  It will be very helpful to have this
capability, particularly when alternate paths to the device exist.


Ack - this is definitely a step forward but until we have better eh 
behaviour for FC the benefits are pretty limited. This is especially the 
case with large LU counts and certain LLDDs since some impose much 
longer timeouts (e.g. lpfc's 60s TMF timeout).


With 5 LUs presented and a single dd driving IO on lpfc I see a time to 
fail an IO of 10-11m when inducing a fabric fault that blackholes all 
traffic to a particular target port on my test setup.


Looking at where the time is being spent in this example there's around 
200s of TUR waits (3m20) and >500s waiting on TMF timeouts (foreach 
device, BDR, foreach target, etc.):


http://paste.fedoraproject.org/11473/81911241/

Environments with 100s of devices can easily spend an hour or more 
waiting for the eh to do its thing.


Regards,
Bryn.



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dm mpath: attach scsi_dh during table resume

2013-04-25 Thread Bryn M. Reeves
On 04/25/2013 04:37 PM, Mike Snitzer wrote:> clariion_match does more 
than check the vendor and product; if tpgs is

> set (ALUA mode) it returns false.
>
> So yes, while there is room for improvement in clariion_match the
> current code should work just fine with reasoning between emc and alua.

Duh, sorry you're right. Completely missed the test for tpgs in emc.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] dm mpath: attach scsi_dh during table resume

2013-04-25 Thread Bryn M. Reeves

On 04/25/2013 03:50 PM, Mikulas Patocka wrote:

On Thu, 25 Apr 2013, Mike Snitzer wrote:

The handler that is automatically attached _should_ be the correct
handler.  We now have the .match() hook for scsi_dh and it has made for
reliable scsi_dh attachment of the correct handler.


The EMC devices work with both ALUA and EMC handlers - so there is no one
"correct" handler, the correct handler is the one that the user specified
in multipath configuration.


I think it's more absolute than that; if a Clariion array is in failover 
mode 4 (ALUA) then it's incorrect to use scsi_dh_emc and vice-versa.


The user can configure this in multipath.conf but it does not make it 
correct. The correct handler is the one that matches the configured 
failover mode of the array.


The ALUA handler scsi_device_tgps() in its match function but since the 
scsi_dh_emc match function only looks at the vendor/product it's 
impossible for it to make the correct decision.


The array can tell us what mode it's running in - teaching scsi_dh_emc 
to do this would seem to be an improvement.


Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bryn M. Reeves

On 03/15/2013 12:46 PM, Bart Van Assche wrote:

The SCSI EH keeps trying until all outstanding request have been
finished. Does lpfc_host_reset_handler() invoke scsi_done() for


I don't think so (ends up calling lpfc_sli_cancel_iocbs() via 
lpfc_hba_down_post() after shutting down the mailbox) but I've not seen 
the EH escalate all the way to host reset in most of my testing - 
usually some time after reaching the bus reset remaining IOs timeout and 
the error bubbles up to device-mapper (all the cases I'm looking at are 
devices managed by a dm-multipath target).


The problem is that getting to this stage can take a very long time - 
much longer than most cluster's node eviction timer for e.g. which is 
the source of much of the complaint about this behaviour.



outstanding requests ? If not, how about modifying
lpfc_host_reset_handler() such that it finishes all outstanding requests
if the remote port is not reachable ?


I'm not sure how safe that is in this situation - James mentioned in the 
I_T nexus reset thread concerns about frames that could be delayed etc. 
in the fabric if the host unilaterally abandons IOs (not sure of the 
details for lpfc at this level).


Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bryn M. Reeves

On 03/15/2013 12:24 PM, Bart Van Assche wrote:

On 03/15/13 12:55, Hannes Reinecke wrote:

And the LLDD is forced into error recovery which'll take _ages_ as each
and every command send during error recovery will time out.


Hello Hannes,

I'm analyzing a related but not identical issue with SRP. It would help
if you could tell with which LLDD you ran into this issue and with which
values of fast_io_fail_tmo and dev_loss_tmo.


Most of the cases I've seen have involved lpfc (although I don't think 
it's in any way exclusive to that LLDD). Even with very low 
fast_io_fail_timeout/dev_loss_timeout (<5/10) the eh is busy for 10m or 
longer before IO fails and multipath is able to react to the problem.


Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_transport_fc: Make 'port_state' writeable

2013-03-15 Thread Bryn M. Reeves

On 03/15/2013 11:55 AM, Hannes Reinecke wrote:

Rationale for this patch is a weird test case with brocade switches;
there you can actually disable a _target_ port. So the port isn't
reachable anymore but no RSCN is send.


I think it's more than a pure test-case; using the rscnsupr feature on 
the Brocades is just a handy way to trigger it.


I've seen numerous cases in the last few years of fabric failures that 
had this characteristic - either because of hardware failures in the 
switches or due to bugs that caused FC traffic to be blackholed without 
an RSCN or other indication (beyond commands timing out).


This (and the I_T nexus reset you proposed in December which is very 
effective at short-ciruiting the eh escalation in the same situation) 
both make the kernel more robust in the face of this kind of problem.


Regards,
Bryn.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to online remove an error scsi disk from the system?

2013-02-01 Thread Bryn M. Reeves

On 02/01/2013 11:13 AM, Tao Ma wrote:

You don't mention the versions of the kernel and driver you're using -
if the system is in production I would suggest contacting who ever
normally provides support for the kernel and distribution that you are
running.

We use CentOS6.2 and the kernel version is 2.6.32-220.23.1.


This is ancient, even by CentOS or RHEL standards. There are thousands 
of patches in more recent kernels (either at kernel.org or in the 
updates in CentOS repositories).


Nobody on linux-kernel or the other lists you copied is going to want to 
investigate problems on such an old kernel - you'll need to either 
reproduce with something current or seek assistance from the CentOS 
community (who will probably tell you to update your kernel first anyway).


Regards,
Bryn.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to online remove an error scsi disk from the system?

2013-02-01 Thread Bryn M. Reeves

On 02/01/2013 09:59 AM, Tao Ma wrote:

yes, but the result is the same. It will do some IO first which will
cause this command hang.


You seem to have a problem with either the device/adapter or in the 
driver. The backtrace you posted shows that jbd2 (ext4) is still waiting 
on IO that's been submitted to an mpt2sas or mpt3sas adapter (I only 
know that because I recognise their log messages - you should try to 
include relevant details like this when seeking assistance).


The adapter/driver hasn't completed the IO and it looks like the SCSI 
layer is trying to abort it. Depending on the state of the driver and 
hardware your only option might be to reboot (or physically hot remove 
the device if your hardware allows it).


You don't mention the versions of the kernel and driver you're using - 
if the system is in production I would suggest contacting who ever 
normally provides support for the kernel and distribution that you are 
running.


Regards,
Bryn.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to online remove an error scsi disk from the system?

2013-02-01 Thread Bryn M. Reeves

On 02/01/2013 07:54 AM, Bart Van Assche wrote:

  * proc_scsi_write - handle writes to /proc/scsi/scsi
  * @file: not used
  * @buf: buffer to write
  * @length: length of buf, at most PAGE_SIZE
  * @ppos: not used
  *
  * Description: this provides a legacy mechanism to add or remove
  * devices by Host, Channel, ID, and Lun.  To use,
  * "echo 'scsi add-single-device 0 1 2 3' > /proc/scsi/scsi" or
  * "echo 'scsi remove-single-device 0 1 2 3' > /proc/scsi/scsi" with
  * "0 1 2 3" replaced by the Host, Channel, Id, and Lun.


The proc interface is deprecated; this can all be done via sysfs today, 
e.g.:


echo 1 > /sys/block/sdc/device/delete

Is equivalent to issuing scsi remove-single-device to proc.

Regards,
Bryn.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html