Re: [PATCH v2 1/8] qla2xxx: kill sessions/log out initiator on RSCN and port down events

2015-07-27 Thread Hannes Reinecke
On 07/27/2015 08:09 PM, Roland Dreier wrote:
 On Mon, Jul 27, 2015 at 1:09 AM, Hannes Reinecke h...@suse.de wrote:
 Hmm? What happened to the original FIXME?
 Is it not required anymore?
 
 Not sure I follow the question.  The original FIXME was /* FIXME:
 Re-enable Global event handling.. */ and this patch indeed re-enables
 global event handling.  I don't think it's a separate patch, it's all
 part of the same work.
 
 Unless I'm misunderstanding...
 
No, not really.
If the patch indeed enables global event handling everything's fine.
Wasn't just not quite obvious from the description nor the patch itself.

So:

Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [for 4.1 PATCH resend] libsas: fix sysfs group not found warnings at port teardown time

2015-07-27 Thread James Bottomley
On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
 On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
 
   }
 
   void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
  diff --git a/drivers/scsi/libsas/sas_port.c 
  b/drivers/scsi/libsas/sas_port.c
  index d3c5297c6c89..9a25ae3a52a4 100644
  --- a/drivers/scsi/libsas/sas_port.c
  +++ b/drivers/scsi/libsas/sas_port.c
  @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
 
if (port-num_phys == 1) {
sas_unregister_domain_devices(port, gone);
  - sas_port_delete(port-port);
port-port = NULL;
} else {
sas_port_delete_phy(port-port, phy-phy);
 
 
  This should become
 
  if (port-num_phys == 1)
  sas_unregister_domain_device(port, gone);
 
  sas_port_delete_phy(port-port, phy-phy);
 
  So we end up with a port scheduled for destruction with no phys rather
  than making the last phy association hang around until the DISCE
  workqueue runs.
 
 Sounds ok in theory.

It's not really a choice.  The specific problem you've introduced with
this patch is failure to cope with link flutter: a deform and form event
queued sequentially.  In the new scheme you're trying to introduce, the
destruct event gets queued from the deform but behind the form and the
link flutter results in a dead link.  I thought just forcing a zero phy
port would fix this, but it won't, either the destruct has to run in the
context of the deform event or the form has to be queued later than the
destruct.  I think coupled with the changes above, there needs to be

if (port-port) {
/* dying port, requeue form event */
resend the PORTE_BYTES_DMAED event
return
}

inside the unmatched port loop in sas_port_form() if nothing is found as
well to close this.

   I don't have a libsas environment handy, I worked with Praveen to
 validate the version as submitted if you want to re-work it.

A couple of days ago, this was so urgent as to have to go outside the
usual patch process ... now it's not important enough for you to bother
working on it; which is it?

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [for 4.1 PATCH resend] libsas: fix sysfs group not found warnings at port teardown time

2015-07-27 Thread Dan Williams
On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
 On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
 On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
 
   }
 
   void sas_device_set_phy(struct domain_device *dev, struct sas_port *port)
  diff --git a/drivers/scsi/libsas/sas_port.c 
  b/drivers/scsi/libsas/sas_port.c
  index d3c5297c6c89..9a25ae3a52a4 100644
  --- a/drivers/scsi/libsas/sas_port.c
  +++ b/drivers/scsi/libsas/sas_port.c
  @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int 
  gone)
 
if (port-num_phys == 1) {
sas_unregister_domain_devices(port, gone);
  - sas_port_delete(port-port);
port-port = NULL;
} else {
sas_port_delete_phy(port-port, phy-phy);
 
 
  This should become
 
  if (port-num_phys == 1)
  sas_unregister_domain_device(port, gone);
 
  sas_port_delete_phy(port-port, phy-phy);
 
  So we end up with a port scheduled for destruction with no phys rather
  than making the last phy association hang around until the DISCE
  workqueue runs.

 Sounds ok in theory.

 It's not really a choice.  The specific problem you've introduced with
 this patch is failure to cope with link flutter: a deform and form event
 queued sequentially.  In the new scheme you're trying to introduce, the
 destruct event gets queued from the deform but behind the form and the
 link flutter results in a dead link.  I thought just forcing a zero phy
 port would fix this, but it won't, either the destruct has to run in the
 context of the deform event or the form has to be queued later than the
 destruct.  I think coupled with the changes above, there needs to be

 if (port-port) {
 /* dying port, requeue form event */
 resend the PORTE_BYTES_DMAED event
 return
 }

 inside the unmatched port loop in sas_port_form() if nothing is found as
 well to close this.

I think it's too late.  Once the lldd has triggered libsas to start
tear down I seem to recall the lldd has the expectation that a new
PORTE_BYTES_DMAED triggers the creation of a new port instance for
that phy.  Once the flutter reaches libsas the race is already lost
and the port needs to be torn down, but I would need to take a closer
look.

   I don't have a libsas environment handy, I worked with Praveen to
 validate the version as submitted if you want to re-work it.

 A couple of days ago, this was so urgent as to have to go outside the
 usual patch process ... now it's not important enough for you to bother
 working on it; which is it?

Neither, it was a reviewed patch that was idling in the process.  I'm
still of the opinion that pinging Andrew in a case like this *is* the
expected process, unless there's a place I can check that a patch is
still in the application queue?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] scsi: fix hang in scsi error handling

2015-07-27 Thread Kevin Groeneveld
 -Original Message-
 From: Hannes Reinecke [mailto:h...@suse.de]
 Sent: July-27-15 6:39 AM
 On 07/16/2015 08:55 PM, Kevin Groeneveld wrote:
  -Original Message-
  From: Hannes Reinecke [mailto:h...@suse.de]
  Sent: July-16-15 7:11 AM
  When the hang occurs shost-host_busy == 2 and shost-host_failed ==
  1 in the scsi_eh_wakeup function. However this function only wakes
  the error handler if host_busy == host_failed.
 
  Which just means that one command is still outstanding, and we need
  to wait for it to complete.
  But see below...
 
  So the root cause of the hang is maybe that the second command never
  completes? Maybe host_failed being non zero is blocking something in
  the port multiplier code?
 
  Hmm.
  I am really not sure about this.
 
  I wasn't sure either, that is one reason why I posted the patch.
 
  'host_busy' indicates the number of outstanding commands, and
  'host_failed' is the number of commands which have failed (on the
  ground that failed commands are considered outstanding, too).
 
  So the first hunk would change the behaviour from 'start SCSI EH once
  all commands are completed or failed' to 'start SCSI EH for _any_
  command if scsi_eh_wakeup is called'
  (note that shost_failed might be '0'...).
  Which doesn't sound right.
 
  So could the patch create any problems by starting the EH any time
  scsi_eh_wakeup is called? Or is it is just inefficient?
 
 SCSI EH _relies_ on the fact that no other commands are outstanding on that
 SCSI host, hence the contents of eh_entry list won't change.
 Your patch breaks this assumption, causing some I/O to be lost.
 
  I guess this needs further debugging to get to the bottom of it.
 
  Any suggestions on things I could try?
 
  The fact that the problem goes away when I only enable one CPU core
  makes me think there is a race happening somewhere.
 
 Not sure here. You're effectively creating an endless loop with your patch,
 assuming that each ioctl will be However, you are effectively creating an
 endless loop with you testcase, assuming that 'ioctl' finishes all I/O before
 returning.
 Which _actually_ is not a requirement; the I/O itself needs to be finished by
 the time the ioctl returns (obviously), but the _structures_ associated with
 the ioctl might linger on a bit longer (delayed freeing and whatnot).
 Yet this is a bit far-fetched, and definitely needs some more analysis.
 
 For debugging I would suggest looking at the lifetime of each scsi command,
 figuring out if by the time the ioctl returns the scsi command is indeed freed
 up.

Thanks for the further feedback on this.

I haven't had a lot of time to debug this further. Last week I did tried 
enabling SCSI logging as you suggested in your previous post. I tried many 
different combinations of setting /proc/sys/dev/scsi/logging_level to enable 
different types and levels of debugging.  However everything I tried either 
resulted in not being able to trigger the problem or nothing useful in the log.

I was thinking of looking into the SCSI trace functionality to see if that 
would give more useful results.

One thing I did notice which may be a small clue is the following values each 
time after the hang:
/sys/class/scsi_device/0:0:0:0/device/device_busy = 1 (CD-ROM)
/sys/class/scsi_device/0:1:0:0/device/device_busy = 0 (HDD)

Before the hang the HDD busy value varies from 0 to 31. After the hang the HDD 
busy value is always 0.

 Also you might want to play around with the 'usleep' a bit; my assumption is
 that at one point for a large enough wait the problem goes away.
 (And, incidentally, we might actually getting more than one pending
 commands if the sleep is small enough; but this is just conjecture :-)

I tried a 10 second usleep.  On the first attempt the third ioctl never 
returned.  After a reboot and a second attempt the 10th ioctl never returned.

I also tried getting rid of the usleep entirely. If I avoid HDD access at the 
same time I can get about 100 ioctl calls per second and /sys/.../device_busy 
never seems to go above 1.  As soon as I access the HDD all SCSI access hangs.


Kevin
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [for 4.1 PATCH resend] libsas: fix sysfs group not found warnings at port teardown time

2015-07-27 Thread James Bottomley
On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
 On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
  On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
  james.bottom...@hansenpartnership.com wrote:
  
}
  
void sas_device_set_phy(struct domain_device *dev, struct sas_port 
   *port)
   diff --git a/drivers/scsi/libsas/sas_port.c 
   b/drivers/scsi/libsas/sas_port.c
   index d3c5297c6c89..9a25ae3a52a4 100644
   --- a/drivers/scsi/libsas/sas_port.c
   +++ b/drivers/scsi/libsas/sas_port.c
   @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int 
   gone)
  
 if (port-num_phys == 1) {
 sas_unregister_domain_devices(port, gone);
   - sas_port_delete(port-port);
 port-port = NULL;
 } else {
 sas_port_delete_phy(port-port, phy-phy);
  
  
   This should become
  
   if (port-num_phys == 1)
   sas_unregister_domain_device(port, gone);
  
   sas_port_delete_phy(port-port, phy-phy);
  
   So we end up with a port scheduled for destruction with no phys rather
   than making the last phy association hang around until the DISCE
   workqueue runs.
 
  Sounds ok in theory.
 
  It's not really a choice.  The specific problem you've introduced with
  this patch is failure to cope with link flutter: a deform and form event
  queued sequentially.  In the new scheme you're trying to introduce, the
  destruct event gets queued from the deform but behind the form and the
  link flutter results in a dead link.  I thought just forcing a zero phy
  port would fix this, but it won't, either the destruct has to run in the
  context of the deform event or the form has to be queued later than the
  destruct.  I think coupled with the changes above, there needs to be
 
  if (port-port) {
  /* dying port, requeue form event */
  resend the PORTE_BYTES_DMAED event
  return
  }
 
  inside the unmatched port loop in sas_port_form() if nothing is found as
  well to close this.
 
 I think it's too late.  Once the lldd has triggered libsas to start
 tear down I seem to recall the lldd has the expectation that a new
 PORTE_BYTES_DMAED triggers the creation of a new port instance for
 that phy.  Once the flutter reaches libsas the race is already lost
 and the port needs to be torn down, but I would need to take a closer
 look.

I don't understand your reasoning.  The expectation is that
PORTE_BYTES_DMAED leads to port formation.  The proposal detects that
this event precedes DISCE_DESTRUCT for the port and requeues the event,
now after DISC_DESTRUCT, so it gets acted on.  Where is the problem?

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [for 4.1 PATCH resend] libsas: fix sysfs group not found warnings at port teardown time

2015-07-27 Thread Dan Williams
On Mon, Jul 27, 2015 at 10:11 AM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
 On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
 On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
  On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
  james.bottom...@hansenpartnership.com wrote:
  
}
  
void sas_device_set_phy(struct domain_device *dev, struct sas_port 
   *port)
   diff --git a/drivers/scsi/libsas/sas_port.c 
   b/drivers/scsi/libsas/sas_port.c
   index d3c5297c6c89..9a25ae3a52a4 100644
   --- a/drivers/scsi/libsas/sas_port.c
   +++ b/drivers/scsi/libsas/sas_port.c
   @@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, int 
   gone)
  
 if (port-num_phys == 1) {
 sas_unregister_domain_devices(port, gone);
   - sas_port_delete(port-port);
 port-port = NULL;
 } else {
 sas_port_delete_phy(port-port, phy-phy);
  
  
   This should become
  
   if (port-num_phys == 1)
   sas_unregister_domain_device(port, gone);
  
   sas_port_delete_phy(port-port, phy-phy);
  
   So we end up with a port scheduled for destruction with no phys rather
   than making the last phy association hang around until the DISCE
   workqueue runs.
 
  Sounds ok in theory.
 
  It's not really a choice.  The specific problem you've introduced with
  this patch is failure to cope with link flutter: a deform and form event
  queued sequentially.  In the new scheme you're trying to introduce, the
  destruct event gets queued from the deform but behind the form and the
  link flutter results in a dead link.  I thought just forcing a zero phy
  port would fix this, but it won't, either the destruct has to run in the
  context of the deform event or the form has to be queued later than the
  destruct.  I think coupled with the changes above, there needs to be
 
  if (port-port) {
  /* dying port, requeue form event */
  resend the PORTE_BYTES_DMAED event
  return
  }
 
  inside the unmatched port loop in sas_port_form() if nothing is found as
  well to close this.

 I think it's too late.  Once the lldd has triggered libsas to start
 tear down I seem to recall the lldd has the expectation that a new
 PORTE_BYTES_DMAED triggers the creation of a new port instance for
 that phy.  Once the flutter reaches libsas the race is already lost
 and the port needs to be torn down, but I would need to take a closer
 look.

 I don't understand your reasoning.  The expectation is that
 PORTE_BYTES_DMAED leads to port formation.  The proposal detects that
 this event precedes DISCE_DESTRUCT for the port and requeues the event,
 now after DISC_DESTRUCT, so it gets acted on.  Where is the problem?


Ah, I read flutter and mistakenly thought debounce.  You're
addressing the case where we're fully committed to the port going
down, but a port up event is injected between the port down and
destruct events.  Yes, I agree re-queuing needs to happen, however
don't we now have queue re-order problem with respect to a new port
down event?  It seems events need to be held off in-order while the
subordinate DISCE events are processed.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/8] qla2xxx: kill sessions/log out initiator on RSCN and port down events

2015-07-27 Thread Roland Dreier
On Mon, Jul 27, 2015 at 1:09 AM, Hannes Reinecke h...@suse.de wrote:
 Hmm? What happened to the original FIXME?
 Is it not required anymore?

Not sure I follow the question.  The original FIXME was /* FIXME:
Re-enable Global event handling.. */ and this patch indeed re-enables
global event handling.  I don't think it's a separate patch, it's all
part of the same work.

Unless I'm misunderstanding...
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [for 4.1 PATCH resend] libsas: fix sysfs group not found warnings at port teardown time

2015-07-27 Thread James Bottomley
On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
 On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
I don't have a libsas environment handy, I worked with Praveen to
  validate the version as submitted if you want to re-work it.
 
  A couple of days ago, this was so urgent as to have to go outside the
  usual patch process ... now it's not important enough for you to bother
  working on it; which is it?
 
 Neither, it was a reviewed patch that was idling in the process.  I'm
 still of the opinion that pinging Andrew in a case like this *is* the
 expected process, unless there's a place I can check that a patch is
 still in the application queue?

I didn't ask you to justify your process, I asked you how important you
thought the patch was mainly because of the conflicting signals you've
sent.  I get that you think I should treat all your patches as important
whether you do or not, but the world doesn't quite work like that: patch
application is a process of triage.  Patches, like this, which have
timing related issues potentially leading to races get looked at by me
as the last reviewer.  The speed of review depends on several factors,
but one of which is what type of user visible issue is this causing.
The user visible effects of this are a nasty warning message and nothing
more, I believe?  A useful indicator in this triage is how important the
submitter thinks the patch is, which was originally why I asked.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: fix memory leak with scsi-mq

2015-07-27 Thread Ewan Milne
On Thu, 2015-07-16 at 11:40 -0400, Tony Battersby wrote:
 Fix a memory leak with scsi-mq triggered by commands with large data
 transfer length.
 
 __sg_alloc_table() sets both table-nents and table-orig_nents to the
 same value.  When the scatterlist is DMA-mapped, table-nents is
 overwritten with the (possibly smaller) size of the DMA-mapped
 scatterlist, while table-orig_nents retains the original size of the
 allocated scatterlist.  scsi_free_sgtable() should therefore check
 orig_nents instead of nents, and all code that initializes sdb-table
 without calling __sg_alloc_table() should set both nents and orig_nents.
 
 Fixes: d285203cf647 (scsi: add support for a blk-mq based I/O path.)
 Cc: sta...@vger.kernel.org # 3.17+
 Signed-off-by: Tony Battersby to...@cybernetics.com
 ---
 
 For immediate inclusion.
 
 diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
 index 106884a..cfadcce 100644
 --- a/drivers/scsi/scsi_error.c
 +++ b/drivers/scsi/scsi_error.c
 @@ -944,7 +944,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct 
 scsi_eh_save *ses,
   scmd-sdb.length);
   scmd-sdb.table.sgl = ses-sense_sgl;
   scmd-sc_data_direction = DMA_FROM_DEVICE;
 - scmd-sdb.table.nents = 1;
 + scmd-sdb.table.nents = scmd-sdb.table.orig_nents = 1;
   scmd-cmnd[0] = REQUEST_SENSE;
   scmd-cmnd[4] = scmd-sdb.length;
   scmd-cmd_len = COMMAND_SIZE(scmd-cmnd[0]);
 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
 index b1a2631..448ebda 100644
 --- a/drivers/scsi/scsi_lib.c
 +++ b/drivers/scsi/scsi_lib.c
 @@ -583,7 +583,7 @@ static struct scatterlist *scsi_sg_alloc(unsigned int 
 nents, gfp_t gfp_mask)
  
  static void scsi_free_sgtable(struct scsi_data_buffer *sdb, bool mq)
  {
 - if (mq  sdb-table.nents = SCSI_MAX_SG_SEGMENTS)
 + if (mq  sdb-table.orig_nents = SCSI_MAX_SG_SEGMENTS)
   return;
   __sg_free_table(sdb-table, SCSI_MAX_SG_SEGMENTS, mq, scsi_sg_free);
  }
 @@ -597,8 +597,8 @@ static int scsi_alloc_sgtable(struct scsi_data_buffer 
 *sdb, int nents, bool mq)
  
   if (mq) {
   if (nents = SCSI_MAX_SG_SEGMENTS) {
 - sdb-table.nents = nents;
 - sg_init_table(sdb-table.sgl, sdb-table.nents);
 + sdb-table.nents = sdb-table.orig_nents = nents;
 + sg_init_table(sdb-table.sgl, nents);
   return 0;
   }
   first_chunk = sdb-table.sgl;
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

Looks good.  James, can we get this in please?

Reviewed-by: Ewan D. Milne emi...@redhat.com


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] scsi: rescan device if an invalid capacity had been reported

2015-07-27 Thread Lee Duncan
On 07/08/2015 12:41 AM, Hannes Reinecke wrote:
 Device paths in ALUA state 'standby' do not necessarily support
 the READ_CAPACITY command. This patch adds a new flag 'invalid_capacity'
 to the scsi device, and rescans the device if an ALUA state
 change occurred.
 
 Signed-off-by: Hannes Reinecke h...@suse.de
 ---
  drivers/scsi/scsi_lib.c|  4 
  drivers/scsi/sd.c  | 19 ++-
  include/scsi/scsi_device.h |  1 +
  3 files changed, 19 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
 index c005e42..d4245f0 100644
 --- a/drivers/scsi/scsi_lib.c
 +++ b/drivers/scsi/scsi_lib.c
 @@ -2701,6 +2701,7 @@ static void scsi_evt_emit(struct scsi_device *sdev, 
 struct scsi_event *evt)
   envp[idx++] = SDEV_UA=INQUIRY_DATA_HAS_CHANGED;
   break;
   case SDEV_EVT_CAPACITY_CHANGE_REPORTED:
 + scsi_rescan_device(sdev-sdev_gendev);
   envp[idx++] = SDEV_UA=CAPACITY_DATA_HAS_CHANGED;
   break;
   case SDEV_EVT_SOFT_THRESHOLD_REACHED_REPORTED:
 @@ -2713,6 +2714,9 @@ static void scsi_evt_emit(struct scsi_device *sdev, 
 struct scsi_event *evt)
   envp[idx++] = SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED;
   break;
   case SDEV_EVT_ALUA_STATE_CHANGE_REPORTED:
 + if (sdev-invalid_capacity)
 + scsi_rescan_device(sdev-sdev_gendev);
 +
   envp[idx++] = SDEV_UA=ASYMMETRIC_ACCESS_STATE_CHANGED;
   break;
   default:
 diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
 index 03fdfa9..7c0bdaa 100644
 --- a/drivers/scsi/sd.c
 +++ b/drivers/scsi/sd.c
 @@ -1983,14 +1983,18 @@ static int read_capacity_16(struct scsi_disk *sdkp, 
 struct scsi_device *sdp,
   continue;
   if (sense_valid 
   sshdr.sense_key == NOT_READY 
 - sshdr.asc == 0x04  sshdr.ascq == 0x0a)
 + sshdr.asc == 0x04  sshdr.ascq == 0x0a) {
   /* Target port in transition */
 + sdp-invalid_capacity = 1;
   return 0;
 + }
   if (sense_valid 
   sshdr.sense_key == NOT_READY 
 - sshdr.asc == 0x04  sshdr.ascq == 0x0b)
 + sshdr.asc == 0x04  sshdr.ascq == 0x0b) {
   /* Target port in standy state */
 + sdp-invalid_capacity = 1;
   return 0;
 + }
   }
   retries--;
  
 @@ -2075,14 +2079,18 @@ static int read_capacity_10(struct scsi_disk *sdkp, 
 struct scsi_device *sdp,
   continue;
   if (sense_valid 
   sshdr.sense_key == NOT_READY 
 - sshdr.asc == 0x04  sshdr.ascq == 0x0a)
 + sshdr.asc == 0x04  sshdr.ascq == 0x0a) {
   /* Target port in transition */
 + sdp-invalid_capacity = 1;
   return 0;
 + }
   if (sense_valid 
   sshdr.sense_key == NOT_READY 
 - sshdr.asc == 0x04  sshdr.ascq == 0x0b)
 + sshdr.asc == 0x04  sshdr.ascq == 0x0b) {
   /* Target port in standy state */
 + sdp-invalid_capacity = 1;
   return 0;
 + }
   }
   retries--;
  
 @@ -2199,7 +2207,8 @@ got_data:
 assuming 512.\n);
   if (!sdkp-physical_block_size)
   sdkp-physical_block_size = sector_size;
 - }
 + } else
 + sdp-invalid_capacity = 0;
  
   if (sector_size != 512 
   sector_size != 1024 
 diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
 index 50c2a36..99bde5a 100644
 --- a/include/scsi/scsi_device.h
 +++ b/include/scsi/scsi_device.h
 @@ -175,6 +175,7 @@ struct scsi_device {
   unsigned no_dif:1;  /* T10 PI (DIF) should be disabled */
   unsigned broken_fua:1;  /* Don't set FUA bit */
   unsigned lun_in_cdb:1;  /* Store LUN bits in CDB[1] */
 + unsigned invalid_capacity:1;/* READ_CAPACITY not supported */

Not sure I like the comment for the new field. How about something like
current capacity unknown?

  
   atomic_t disk_events_disable_depth; /* disable depth for disk events */
  
 

-- 
Lee Duncan
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [for 4.1 PATCH resend] libsas: fix sysfs group not found warnings at port teardown time

2015-07-27 Thread James Bottomley
On Mon, 2015-07-27 at 10:55 -0700, Dan Williams wrote:
 On Mon, Jul 27, 2015 at 10:11 AM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
  On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
  james.bottom...@hansenpartnership.com wrote:
   On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
   On Wed, Jul 22, 2015 at 11:28 AM, James Bottomley
   james.bottom...@hansenpartnership.com wrote:
   
 }
   
 void sas_device_set_phy(struct domain_device *dev, struct sas_port 
*port)
diff --git a/drivers/scsi/libsas/sas_port.c 
b/drivers/scsi/libsas/sas_port.c
index d3c5297c6c89..9a25ae3a52a4 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -219,7 +219,6 @@ void sas_deform_port(struct asd_sas_phy *phy, 
int gone)
   
  if (port-num_phys == 1) {
  sas_unregister_domain_devices(port, gone);
- sas_port_delete(port-port);
  port-port = NULL;
  } else {
  sas_port_delete_phy(port-port, phy-phy);
   
   
This should become
   
if (port-num_phys == 1)
sas_unregister_domain_device(port, gone);
   
sas_port_delete_phy(port-port, phy-phy);
   
So we end up with a port scheduled for destruction with no phys rather
than making the last phy association hang around until the DISCE
workqueue runs.
  
   Sounds ok in theory.
  
   It's not really a choice.  The specific problem you've introduced with
   this patch is failure to cope with link flutter: a deform and form event
   queued sequentially.  In the new scheme you're trying to introduce, the
   destruct event gets queued from the deform but behind the form and the
   link flutter results in a dead link.  I thought just forcing a zero phy
   port would fix this, but it won't, either the destruct has to run in the
   context of the deform event or the form has to be queued later than the
   destruct.  I think coupled with the changes above, there needs to be
  
   if (port-port) {
   /* dying port, requeue form event */
   resend the PORTE_BYTES_DMAED event
   return
   }
  
   inside the unmatched port loop in sas_port_form() if nothing is found as
   well to close this.
 
  I think it's too late.  Once the lldd has triggered libsas to start
  tear down I seem to recall the lldd has the expectation that a new
  PORTE_BYTES_DMAED triggers the creation of a new port instance for
  that phy.  Once the flutter reaches libsas the race is already lost
  and the port needs to be torn down, but I would need to take a closer
  look.
 
  I don't understand your reasoning.  The expectation is that
  PORTE_BYTES_DMAED leads to port formation.  The proposal detects that
  this event precedes DISCE_DESTRUCT for the port and requeues the event,
  now after DISC_DESTRUCT, so it gets acted on.  Where is the problem?
 
 
 Ah, I read flutter and mistakenly thought debounce.  You're
 addressing the case where we're fully committed to the port going
 down, but a port up event is injected between the port down and
 destruct events.  Yes, I agree re-queuing needs to happen, however
 don't we now have queue re-order problem with respect to a new port
 down event?  It seems events need to be held off in-order while the
 subordinate DISCE events are processed.

No, that seems to be the intent of the prior code.  The reason port
visibility goes immediately (along with all associated phys), is that
the port is ready for reuse as soon as sas_deform_port() returns.
Destroying the subtree is left to DISCE_DESTRUCT, but since the subtree
is not visible, a new port can form even as the elements associated with
the old port are being destroyed.  the DISCE_DISCOVER that populates it
will queue behind the destruct, so everything just works today (modulo
the new warning).

It would be ideal if we could detect in the destruct queue that the
visibility is already gone and make use of it, but we can't.  The patch
that causes this warning induces a mismatch between the state of the
kernfs tree and the kobjects ... we still have to call device_del which
leads to kobject_del (which triggers the warning) to bring this state
back into alignment.  So the disconnected subtree we originally used to
make all this work is what's suddenly been rendered invalid.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [for 4.1 PATCH resend] libsas: fix sysfs group not found warnings at port teardown time

2015-07-27 Thread Dan Williams
On Mon, Jul 27, 2015 at 11:07 AM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
 On Mon, 2015-07-27 at 08:48 -0700, Dan Williams wrote:
 On Mon, Jul 27, 2015 at 8:17 AM, James Bottomley
 james.bottom...@hansenpartnership.com wrote:
  On Wed, 2015-07-22 at 14:08 -0700, Dan Williams wrote:
I don't have a libsas environment handy, I worked with Praveen to
  validate the version as submitted if you want to re-work it.
 
  A couple of days ago, this was so urgent as to have to go outside the
  usual patch process ... now it's not important enough for you to bother
  working on it; which is it?

 Neither, it was a reviewed patch that was idling in the process.  I'm
 still of the opinion that pinging Andrew in a case like this *is* the
 expected process, unless there's a place I can check that a patch is
 still in the application queue?

 I didn't ask you to justify your process, I asked you how important you
 thought the patch was mainly because of the conflicting signals you've
 sent.  I get that you think I should treat all your patches as important
 whether you do or not, but the world doesn't quite work like that: patch
 application is a process of triage.  Patches, like this, which have
 timing related issues potentially leading to races get looked at by me
 as the last reviewer.  The speed of review depends on several factors,
 but one of which is what type of user visible issue is this causing.
 The user visible effects of this are a nasty warning message and nothing
 more, I believe?  A useful indicator in this triage is how important the
 submitter thinks the patch is, which was originally why I asked.


That would be a question to Praveen.  It wasn't clear to me whether
this sysfs backtrace was a simply a warning or eventually fatal to the
box.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [for 4.1 PATCH resend] libsas: fix sysfs group not found warnings at port teardown time

2015-07-27 Thread Praveen Murali
 -Original Message-
 From: Dan Williams [mailto:dan.j.willi...@intel.com]
 Sent: Monday, July 27, 2015 11:25 AM
 To: James Bottomley
 Cc: Christoph Hellwig; Praveen Murali; linux-scsi; sta...@vger.kernel.org
 Subject: Re: [for 4.1 PATCH resend] libsas: fix sysfs group not found 
 warnings
 at port teardown time
 
  I didn't ask you to justify your process, I asked you how important you
  thought the patch was mainly because of the conflicting signals you've
  sent.  I get that you think I should treat all your patches as important
  whether you do or not, but the world doesn't quite work like that: patch
  application is a process of triage.  Patches, like this, which have
  timing related issues potentially leading to races get looked at by me
  as the last reviewer.  The speed of review depends on several factors,
  but one of which is what type of user visible issue is this causing.
  The user visible effects of this are a nasty warning message and nothing
  more, I believe?  A useful indicator in this triage is how important the
  submitter thinks the patch is, which was originally why I asked.
 
 
 That would be a question to Praveen.  It wasn't clear to me whether
 this sysfs backtrace was a simply a warning or eventually fatal to the
 box.
As far as I remember, the issue was mostly with the sysfs backtraces. I don’t
remember it causing a fatal error; but that could very well be because I did not
run it long enough.

N�r��yb�X��ǧv�^�)޺{.n�+{{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

Re: [for 4.1 PATCH resend] libsas: fix sysfs group not found warnings at port teardown time

2015-07-27 Thread Dan Williams
On Mon, Jul 27, 2015 at 11:38 AM, James Bottomley
james.bottom...@hansenpartnership.com wrote:
 No, that seems to be the intent of the prior code.  The reason port
 visibility goes immediately (along with all associated phys), is that
 the port is ready for reuse as soon as sas_deform_port() returns.
 Destroying the subtree is left to DISCE_DESTRUCT, but since the subtree
 is not visible, a new port can form even as the elements associated with
 the old port are being destroyed.  the DISCE_DISCOVER that populates it
 will queue behind the destruct, so everything just works today (modulo
 the new warning).

 It would be ideal if we could detect in the destruct queue that the
 visibility is already gone and make use of it, but we can't.  The patch
 that causes this warning induces a mismatch between the state of the
 kernfs tree and the kobjects ... we still have to call device_del which
 leads to kobject_del (which triggers the warning) to bring this state
 back into alignment.  So the disconnected subtree we originally used to
 make all this work is what's suddenly been rendered invalid.


Looking at the original report this seemed new for the 3.16 kernel.
However, looking closer, that warning in sysfs_remove_group() has been
present for a long time before that.  I don't recall seeing it back
when we were doing focused hotplug testing with the isci driver for
the 3.5 / 3.6 kernels, so I believe it is newly uncovered since then.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qla2xxx: Return the fabric command state for non-task management requests

2015-07-27 Thread Quinn Tran
Christoph,

Currently, the command state within TCM (se_cmd.t_state) only track
command states from the point of new to the Back End and from Back End up
to QLA driver.  From the debug perspective, that¹s only half the picture.
The other half comes from qla2xxx¹s private command state provided by this
patch.

Having another trace point that happens 5~10 seconds ago might be
difficult to back track.  For task management debugging, dumping the
current states (se_cmd + qla_tgt_cmd) of each command when the TMR is
received provides some benefit.

If you feel uncomfortable with the driver defined binary, one
alternative would be to translate QLA¹s private command state into
se_cmd¹s new field.  This new file would be modify by the fabric layer
only.  This would limit any regression with existing se_cmd field
modification.


Regards,
Quinn Tran




On 7/24/15, 11:29 PM, target-devel-ow...@vger.kernel.org on behalf of
Christoph Hellwig target-devel-ow...@vger.kernel.org on behalf of
h...@infradead.org wrote:

Which debug printk do you care about?  I'd much prefer having a trace
point inside the driver which could even pretty print it instead of the
hack where a driver defined binary value is printed by the core.
--
To unsubscribe from this list: send the line unsubscribe target-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

attachment: winmail.dat

Re: [PATCH] scsi: fix hang in scsi error handling

2015-07-27 Thread Hannes Reinecke
On 07/16/2015 08:55 PM, Kevin Groeneveld wrote:
 -Original Message-
 From: Hannes Reinecke [mailto:h...@suse.de]
 Sent: July-16-15 7:11 AM
 When the hang occurs shost-host_busy == 2 and shost-host_failed == 1
 in the scsi_eh_wakeup function. However this function only wakes the
 error handler if host_busy == host_failed.

 Which just means that one command is still outstanding, and we need to wait
 for it to complete.
 But see below...
 
 So the root cause of the hang is maybe that the second command never
 completes? Maybe host_failed being non zero is blocking something in the
 port multiplier code?
 
 Hmm.
 I am really not sure about this.
 
 I wasn't sure either, that is one reason why I posted the patch.
 
 'host_busy' indicates the number of outstanding commands, and
 'host_failed' is the number of commands which have failed (on the ground
 that failed commands are considered outstanding, too).

 So the first hunk would change the behaviour from 'start SCSI EH once all
 commands are completed or failed' to 'start SCSI EH for _any_ command if
 scsi_eh_wakeup is called'
 (note that shost_failed might be '0'...).
 Which doesn't sound right.
 
 So could the patch create any problems by starting the EH any time
 scsi_eh_wakeup is called? Or is it is just inefficient?
 
SCSI EH _relies_ on the fact that no other commands are outstanding
on that SCSI host, hence the contents of eh_entry list won't change.
Your patch breaks this assumption, causing some I/O to be lost.

 I guess this needs further debugging to get to the bottom of it.
 
 Any suggestions on things I could try?
 
 The fact that the problem goes away when I only enable one CPU core makes
 me think there is a race happening somewhere.
 
Not sure here. You're effectively creating an endless loop with your
patch, assuming that each ioctl will be
However, you are effectively creating an endless loop with you
testcase, assuming that 'ioctl' finishes all I/O before returning.
Which _actually_ is not a requirement; the I/O itself needs to be
finished by the time the ioctl returns (obviously), but the
_structures_ associated with the ioctl might linger on a bit longer
(delayed freeing and whatnot).
Yet this is a bit far-fetched, and definitely needs some more analysis.

For debugging I would suggest looking at the lifetime of each scsi
command, figuring out if by the time the ioctl returns the scsi
command is indeed freed up.
Also you might want to play around with the 'usleep' a bit; my
assumption is that at one point for a large enough wait the problem
goes away.
(And, incidentally, we might actually getting more than one pending
commands if the sleep is small enough; but this is just conjecture :-)

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/8] qla2xxx: kill sessions/log out initiator on RSCN and port down events

2015-07-27 Thread Hannes Reinecke
On 07/14/2015 10:00 PM, Himanshu Madhani wrote:
 From: Roland Dreier rol...@purestorage.com
 
 To fix some issues talking to ESX, this patch modifies the qla2xxx driver
 so that it never logs into remote ports.  This has the side effect of
 getting rid of the rports entirely, which means we never log out of
 initiators and never tear down sessions when an initiator goes away.
 
 This is mostly OK, except that we can run into trouble if we have
 initiator A assigned FC address X:Y:Z by the fabric talking to us, and
 then initiator A goes away.  Some time (could be a long time) later,
 initiator B comes along and also gets FC address X:Y:Z (which is
 available again, because initiator A is gone).  If initiator B starts
 talking to us, then we'll still have the session for initiator A, and
 since we look up incoming IO based on the FC address X:Y:Z, initiator B
 will end up using ACLs for initiator A.
 
 Fix this by:
 
  1. Handling RSCN events somewhat differently; instead of completely
 skipping the processing of fcports, we look through the list, and if
 an fcport disappears, we tell the target code the tear down the
 session and tell the HBA FW to release the N_Port handle.
 
  2. Handling port down events by flushing all of our sessions.  The
 firmware was already releasing the N_Port handle but we want the
 target code to drop all the sessions too.
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Roland Dreier rol...@purestorage.com
 Signed-off-by: Alexei Potashnik ale...@purestorage.com
 Acked-by: Quinn Tran quinn.t...@qlogic.com
 Signed-off-by: Himanshu Madhani himanshu.madh...@qlogic.com
 ---
  drivers/scsi/qla2xxx/qla_dbg.c|2 +-
  drivers/scsi/qla2xxx/qla_init.c   |  137 
 ++---
  drivers/scsi/qla2xxx/qla_target.c |9 ++-
  3 files changed, 117 insertions(+), 31 deletions(-)
 
 diff --git a/drivers/scsi/qla2xxx/qla_dbg.c b/drivers/scsi/qla2xxx/qla_dbg.c
 index 0e6ee3c..e9ae6b9 100644
 --- a/drivers/scsi/qla2xxx/qla_dbg.c
 +++ b/drivers/scsi/qla2xxx/qla_dbg.c
 @@ -68,7 +68,7 @@
   * |  || 0xd101-0xd1fe   
 |
   * |  || 0xd214-0xd2fe   
 |
   * | Target Mode   |   0xe079   ||
 - * | Target Mode Management|   0xf072   | 0xf002 |
 + * | Target Mode Management|   0xf080   | 0xf002 |
   * |  || 0xf046-0xf049  |
   * | Target Mode Task Management  |0x1000b  ||
   * --
 diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
 index 766fdfc..4895a4a 100644
 --- a/drivers/scsi/qla2xxx/qla_init.c
 +++ b/drivers/scsi/qla2xxx/qla_init.c
 @@ -3462,20 +3462,43 @@ qla2x00_configure_fabric(scsi_qla_host_t *vha)
   if ((fcport-flags  FCF_FABRIC_DEVICE) == 0)
   continue;
  
 - if (fcport-scan_state == QLA_FCPORT_SCAN 
 - atomic_read(fcport-state) == FCS_ONLINE) {
 - qla2x00_mark_device_lost(vha, fcport,
 - ql2xplogiabsentdevice, 0);
 - if (fcport-loop_id != FC_NO_LOOP_ID 
 - (fcport-flags  FCF_FCP2_DEVICE) == 0 
 - fcport-port_type != FCT_INITIATOR 
 - fcport-port_type != FCT_BROADCAST) {
 - ha-isp_ops-fabric_logout(vha,
 - fcport-loop_id,
 - fcport-d_id.b.domain,
 - fcport-d_id.b.area,
 - fcport-d_id.b.al_pa);
 - qla2x00_clear_loop_id(fcport);
 + if (fcport-scan_state == QLA_FCPORT_SCAN) {
 + if (qla_ini_mode_enabled(base_vha) 
 + atomic_read(fcport-state) == FCS_ONLINE) {
 + qla2x00_mark_device_lost(vha, fcport,
 + ql2xplogiabsentdevice, 0);
 + if (fcport-loop_id != FC_NO_LOOP_ID 
 + (fcport-flags  FCF_FCP2_DEVICE) 
 == 0 
 + fcport-port_type != FCT_INITIATOR 
 
 + fcport-port_type != FCT_BROADCAST) 
 {
 + ha-isp_ops-fabric_logout(vha,
 + fcport-loop_id,
 + fcport-d_id.b.domain,
 + fcport-d_id.b.area,
 +

Re: [PATCH v2 5/8] qla2xxx: added sess generations to detect RSCN update races

2015-07-27 Thread Hannes Reinecke
On 07/14/2015 10:00 PM, Himanshu Madhani wrote:
 From: Alexei Potashnik ale...@purestorage.com
 
 RSCN processing in qla2xxx driver can run in parallel with ELS/IO
 processing. As such the decision to remove disappeared fc port's
 session could be stale, because a new login sequence has occurred
 since and created a brand new session.
 
 Previous mechanism of dealing with this by delaying deletion request
 was prone to erroneous deletions if the event that was supposed to
 cancel the deletion never arrived or has been delayed in processing.
 
 New mechanism relies on a time-like generation counter to serialize
 RSCN updates relative to ELS/IO updates.
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Alexei Potashnik ale...@purestorage.com
 Signed-off-by: Himanshu Madhani himanshu.madh...@qlogic.com
 ---
Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/8] qla2xxx: disable scsi_transport_fc registration in target mode

2015-07-27 Thread Hannes Reinecke
On 07/14/2015 10:00 PM, Himanshu Madhani wrote:
 From: Alexei Potashnik ale...@purestorage.com
 
 There are multiple reasons for disabling this:
 
 1. It provides no functional benefit. We pretty much only get a few more
 sysfs entries for each port, but all that information is already
 available from /sys/kernel/debug/target/qla-session-X
 
 2. It already only works in private-loop mode. By disabling we'll be
 getting more uniform behavior with fabric mode.
 
 3. It creates complications for the new PLOGI handling mechanism:
 scsi_transport_fc port deletion timer could race with new session
 from initiator and cause logout after successful login.
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Alexei Potashnik ale...@purestorage.com
 Signed-off-by: Himanshu Madhani himanshu.madh...@qlogic.com
 ---
Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 7/8] qla2xxx: drop cmds/tmrs arrived while session is being deleted

2015-07-27 Thread Hannes Reinecke
On 07/14/2015 10:00 PM, Himanshu Madhani wrote:
 From: Alexei Potashnik ale...@purestorage.com
 
 If a new initiator (different WWN) shows up on the same fcport, old
 initiator's session is scheduled for deletion. But there is a small
 window between it being marked with QLA_SESS_DELETION_IN_PROGRESS
 and qlt_unret_sess getting called when new session's commands will
 keep finding old session in the fcport map.
 
 This patch drops cmds/tmrs if they find session in the progress of
 being deleted.
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Alexei Potashnik ale...@purestorage.com
 Acked-by: Quinn Tran quinn.t...@qlogic.com
 Signed-off-by: Himanshu Madhani himanshu.madh...@qlogic.com
 ---
Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/8] qla2xxx: cleanup cmd in qla workqueue before processing TMR

2015-07-27 Thread Hannes Reinecke
On 07/14/2015 10:00 PM, Himanshu Madhani wrote:
 From: Swapnil Nagle swapnil.na...@purestorage.com
 
 Since cmds go into qla_tgt_wq and TMRs don't, it's possible that TMR
 like TASK_ABORT can be queued over the cmd for which it was meant.
 To avoid this race, use a per-port list to keep track of cmds that
 are enqueued to qla_tgt_wq but not yet processed. When a TMR arrives,
 iterate through this list and remove any cmds that match the TMR.
 This patch supports TASK_ABORT and LUN_RESET.
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Swapnil Nagle swapnil.na...@purestorage.com
 Signed-off-by: Alexei Potashnik ale...@purestorage.com
 Acked-by: Quinn Tran quinn.t...@qlogic.com
 Signed-off-by: Himanshu Madhani himanshu.madh...@qlogic.com
 ---
[ .. ]
 @@ -3374,14 +3462,25 @@ static void qlt_create_sess_from_atio(struct 
 work_struct *work)
   unsigned long flags;
   uint8_t *s_id = op-atio.u.isp24.fcp_hdr.s_id;
  
 + spin_lock_irqsave(vha-cmd_list_lock, flags);
 + list_del(op-cmd_list);
 + spin_unlock_irqrestore(vha-cmd_list_lock, flags);
 +
 + if (op-aborted) {
 + ql_dbg(ql_dbg_tgt_mgt, vha, 0xf083,
 + sess_op with tag %u is aborted\n,
 + op-atio.u.isp24.exchange_addr);
 + goto out_term;
 + }
 +
   ql_dbg(ql_dbg_tgt_mgt, vha, 0xf022,
 - qla_target(%d): Unable to find wwn login
 -  (s_id %x:%x:%x), trying to create it manually\n,
 - vha-vp_idx, s_id[0], s_id[1], s_id[2]);
 + qla_target(%d): Unable to find wwn login
 +  (s_id %x:%x:%x), trying to create it manually\n,
 + vha-vp_idx, s_id[0], s_id[1], s_id[2]);
  
   if (op-atio.u.raw.entry_count  1) {
   ql_dbg(ql_dbg_tgt_mgt, vha, 0xf023,
 - Dropping multy entry atio %p\n, op-atio);
 + Dropping multy entry atio %p\n, op-atio);
   goto out_term;
   }
  
Pointless indentation change.

Other than that:
Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes

-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/8] qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrives

2015-07-27 Thread Hannes Reinecke
On 07/14/2015 10:00 PM, Himanshu Madhani wrote:
 From: Alexei Potashnik ale...@purestorage.com
 
 cancel any commands from initiator's s_id that are still waiting
 on qla_tgt_wq when PLOGI arrives.
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Alexei Potashnik ale...@purestorage.com
 Acked-by: Quinn Tran quinn.t...@qlogic.com
 Signed-off-by: Himanshu Madhani himanshu.madh...@qlogic.com
 ---
Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/8] qla2xxx: delay plogi/prli ack until existing sessions are deleted

2015-07-27 Thread Hannes Reinecke
On 07/14/2015 10:00 PM, Himanshu Madhani wrote:
 From: Alexei Potashnik ale...@purestorage.com
 
 - keep qla_tgt_sess object on the session list until it's freed
 
 - modify use of sess-deleted flag to differentiate delayed
   session deletion that can be cancelled from irreversible one:
   QLA_SESS_DELETION_PENDING vs QLA_SESS_DELETION_IN_PROGRESS
 
 - during IN_PROGRESS deletion all newly arrived commands and TMRs will
   be rejected, existing commands and TMRs will be terminated when
   given by the core to the fabric or simply dropped if session logout
   has already happened (logout terminates all existing exchanges)
 
 - new PLOGI will initiate deletion of the following sessions
   (unless deletion is already IN_PROGRESS):
   - with the same port_name (with logout)
   - different port_name, different loop_id but the same port_id
 (with logout)
   - different port_name, different port_id, but the same loop_id
 (without logout)
 
 - additionally each new PLOGI will store imm notify iocb in the
   same port_name session being deleted. When deletion process
   completes this iocb will be acked. Only the most recent PLOGI
   iocb is stored. The older ones will be terminated when replaced.
 
 - new PRLI will initiate deletion of the following sessions
   (unless deletion is already IN_PROGRESS):
   - different port_name, different port_id, but the same loop_id
(without logout)
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Alexei Potashnik ale...@purestorage.com
 Acked-by: Quinn Tran quinn.t...@qlogic.com
 Signed-off-by: Himanshu Madhani himanshu.madh...@qlogic.com
 ---
Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 8/8] qla2xxx: terminate exchange when command is aborted by LIO

2015-07-27 Thread Hannes Reinecke
On 07/14/2015 10:00 PM, Himanshu Madhani wrote:
 From: Alexei Potashnik ale...@purestorage.com
 
 The newly introduced aborted_task TFO callback has to terminate
 exchange with QLogic driver, since command is being deleted and
 no status will be queued to the driver at a later point.
 
 This patch also moves the burden of releasing one cmd refcount to
 the aborted_task handler.
 
 Changed iSCSI aborted_task logic to satisfy the above requirement.
 
 Cc: sta...@vger.kernel.org
 Signed-off-by: Alexei Potashnik ale...@purestorage.com
 Acked-by: Quinn Tran quinn.t...@qlogic.com
 Signed-off-by: Himanshu Madhani himanshu.madh...@qlogic.com
 ---
Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] [SCSI] mpt2sas, mpt3sas: Abort initialization if no memory I/O resources detected

2015-07-27 Thread Hannes Reinecke
On 07/15/2015 06:49 AM, Sreekanth Reddy wrote:
 Driver crashes if the BIOS do not set up at least one
 memory I/O resource. This failure can happen if the device is too
 slow to respond during POST and is missed by the BIOS, but Linux
 then detects the device later in the boot process.
 
 Changes in v3:
Rearranged the code to remove the code redundancy
 
 Signed-off-by: Sreekanth Reddy sreekanth.re...@avagotech.com
 ---
Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] MAINTAINERS: Update LSILOGIC MPT FUSION DRIVERS (FC/SAS/SPI) maintainers list

2015-07-27 Thread Hannes Reinecke
On 07/15/2015 06:50 AM, Sreekanth Reddy wrote:
 Updating maintainers list for the entry LSILOGIC MPT FUSION DRIVERS in
 MAINTAINERS file
 
 Signed-off-by: Sreekanth Reddy sreekanth.re...@avagotech.com
 ---
  MAINTAINERS | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)
 
 diff --git a/MAINTAINERS b/MAINTAINERS
 index 2d3d55c..c3cd5c9 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
 @@ -6348,10 +6348,9 @@ S: Maintained
  F:   arch/arm/mach-lpc32xx/
  
  LSILOGIC MPT FUSION DRIVERS (FC/SAS/SPI)
 -M:   Nagalakshmi Nandigama nagalakshmi.nandig...@avagotech.com
 -M:   Praveen Krishnamoorthy praveen.krishnamoor...@avagotech.com
 +M:   Sathya Prakash sathya.prak...@avagotech.com
  M:   Sreekanth Reddy sreekanth.re...@avagotech.com
 -M:   Abhijit Mahajan abhijit.maha...@avagotech.com
 +M:   Chaitra Basappa chaitra.basa...@avagotech.com
  L:   mpt-fusionlinux@avagotech.com
  L:   linux-scsi@vger.kernel.org
  W:   http://www.lsilogic.com/support
 
Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/52] atp870u: Major rework

2015-07-27 Thread Hannes Reinecke
On 07/26/2015 10:24 PM, Ondrej Zary wrote:
 This patch series removes the first level of crap from atp870u - the worst
 (in-tree) SCSI driver [1] and then cleans up the code a bit.
 
 First the driver is converted to C from something that looks like a (bad)
 assembler code: use of tmport for all I/O accesses and gotos all over the
 code.
 Then I/O access wrappers are introduced.
 Finally, is870, is880 and is885 are unified and two of them removed.
 
 After that, some cleanups are done. There's still a lot of work left to do.
 
 Tested on AEC-6710D (ATP870IU-C).
 
 [1] http://www.spinics.net/lists/linux-scsi/msg81142.html
 
  atp870u.c | 3707 
 +-
  atp870u.h |4
  2 files changed, 1064 insertions(+), 2647 deletions(-)
 
 Partially (up to patch 34) Acked-by: Christoph Hellwig h...@lst.de
 
For this patchset:

Reviewed-by: Hannes Reinecke h...@suse.com

Cheers,

Hannes
-- 
Dr. Hannes ReineckezSeries  Storage
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html