[PATCH] zfcp: fix use after free bug

2007-11-23 Thread Swen Schillig
From: Heiko Carstens [EMAIL PROTECTED]

zfcp_erp_strategy_check_fsfreq() checks if it is safe to access the
fsf_req associated with the erp_action that gets passed. To test if
it is safe it accesses the fsf_req in order to get its index into
the hash list. This is broken since the fsf_req might be freed already
and the read index has no meaning. It could lead to memory corruption.
Fix this by introducing a new zfcp_reqlist_find_safe() method which
just checks if addresses are equal. This is slower, but only gets
called in case of error recovery.

Signed-off-by: Heiko Carstens [EMAIL PROTECTED]
Signed-off-by: Martin Schwidefsky [EMAIL PROTECTED]
Signed-off-by: Swen Schillig [EMAIL PROTECTED]
---

 drivers/s390/scsi/zfcp_def.h |   14 ++
 drivers/s390/scsi/zfcp_erp.c |3 ++-
 2 files changed, 16 insertions(+), 1 deletion(-)

Index: SHIP_OCT2005/drivers/s390/scsi/zfcp_def.h
===
--- SHIP_OCT2005.orig/drivers/s390/scsi/zfcp_def.h
+++ SHIP_OCT2005/drivers/s390/scsi/zfcp_def.h
@@ -1210,6 +1210,20 @@ zfcp_reqlist_find(struct zfcp_adapter *a
return NULL;
 }
 
+static inline struct zfcp_fsf_req *
+zfcp_reqlist_find_safe(struct zfcp_adapter *adapter, struct zfcp_fsf_req *req)
+{
+   struct zfcp_fsf_req *request;
+   unsigned int idx;
+
+   for (idx = 0; idx  REQUEST_LIST_SIZE; idx++) {
+   list_for_each_entry(request, adapter-req_list[idx], list)
+   if (request == req)
+   return request;
+   }
+   return NULL;
+}
+
 /*
  *  functions needed for reference/usage counting
  */
Index: SHIP_OCT2005/drivers/s390/scsi/zfcp_erp.c
===
--- SHIP_OCT2005.orig/drivers/s390/scsi/zfcp_erp.c
+++ SHIP_OCT2005/drivers/s390/scsi/zfcp_erp.c
@@ -837,7 +837,8 @@ zfcp_erp_strategy_check_fsfreq(struct zf
if (erp_action-fsf_req) {
/* take lock to ensure that request is not deleted meanwhile */
spin_lock(adapter-req_list_lock);
-   if (zfcp_reqlist_find(adapter, erp_action-fsf_req-req_id)) {
+   if (zfcp_reqlist_find_safe(adapter, erp_action-fsf_req) 
+   erp_action-fsf_req-erp_action == erp_action) {
/* fsf_req still exists */
debug_text_event(adapter-erp_dbf, 3, a_ca_req);
debug_event(adapter-erp_dbf, 3, erp_action-fsf_req,
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/3] [SCSI/libata] libata EH conversion for ipr SAS

2007-11-23 Thread Jeff Garzik

Brian King wrote:

The following three patches convert ipr to use the new libata EH APIs.
In the process of doing this, I first looked into implementing this
in a similar manner to how libata SAS is done today, which is hooking
into target_alloc/target_destroy to allocate/delete sata ports. While
I was able to get this working after writing my own eh_strategy_handler,
I was doubtful this was the best long term solution. One problem with this
implementation I didn't solve was the fact that libata now invokes EH
for each and every error received. For some devices, such as optical
devices, each not ready response received during a media reload would
result in all the attached SAS devices getting quiesced as well.

My second approach is the attached patch set. In this series I have
created a new libata API which can be used by SAS LLDDs. It introduces
an ata_sas_rphy device object which is created/destroyed by the following API:

ata_sas_rphy_alloc
ata_sas_rphy_add
ata_sas_rphy_delete

When using this API in ipr, I made ipr's scsi_host the parent device of the
SATA rphy. The SATA rphy is then the parent of the allocated scsi_hosts. This
means that each SATA rphy in the SAS topology will have its own scsi_host, 
making
SAS *much* more like all the SATA LLDDs in how it uses libata.

The only issue I ran into with this implementation is that since each SATA
port has its own scsi_host, the adapter cannot rely on scsi core to manage
any LLDD or adapter imposed queue depth. To solve this I added some code to
ipr. Longer term, block layer queue groups might be another way to do this.

I'm still polishing this up, but it is up and running and seems to work with
what testing I've done so far.


I'm generally happy with this, though I am curious what Tejun thinks as 
well.


Once everybody is happy, I think we should collect libata ACKs and then 
push this via the SCSI maintainership route.  That would libsas work in 
parallel, with perhaps in situ tweaks and fixes as the implementation is 
fleshed out.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-23 Thread Hannes Reinecke
Hannes Reinecke wrote:
 Laurent Riffard wrote:
 Le 21.11.2007 23:41, Andrew Morton a écrit :
 On Wed, 21 Nov 2007 22:45:22 +0100
 Laurent Riffard [EMAIL PROTECTED] wrote:

 Le 21.11.2007 05:45, Andrew Morton a écrit :
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/
 Hello, 

 My system hangs shortly after I logged in Gnome desktop. SysRq-W shows
 that a bunch of task are blocked in D state, they seem to wait for
 some I/O completion. I can try to hand-copy some data if requested.

 I found these messages in dmesg:

 ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 
 EXT3-fs: mounted filesystem with ordered data mode.
 sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT 
 driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sda, sector 16460
 ReiserFS: sda7: found reiserfs format 3.6 with standard journal
 ReiserFS: sda7: using ordered data mode
 --
 ReiserFS: sda7: Using r5 hash to sort names
 sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
 driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdb, sector 19632
 sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
 driverbyte=DRIVER_OK,SUGGEST_OK
 end_request: I/O error, dev sdb, sector 40037363
 Adding 1048568k swap on /dev/mapper/vglinux1-lvswap.  Priority:-1 
 extents:1 across:1048568k
 lp0: using parport0 (interrupt-driven).

 These errors occur *only* with 2.6.24-rc3-mm1, they are 100% reproducible.
 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine.

 Maybe something is broken in pata_via driver ?

 Could be - 
 libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch
 and 
 pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch
 touch pata_via.c.
 None of the above...

 I did a bisection, it spotted git-scsi-misc.patch. 
 I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine.

 I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 [SCSI] Do not 
 requeue requests if REQ_FAILFAST is set is the real culprit. The other 
 commits are touching documentation or drivers I don't use. I'll try 
 to revert only this one this evening.

 Hmm. Weird. I'll have a look into it. Apparently I'll be returning an error 
 where
 I shouldn't. Checking ...
 
Ok, found it. We are blocking even special commands (ie requests with PREEMPT 
not set)
when FAILFAST is set. Which is clearly wrong. The attached patch fixes this.

James, please apply.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
Fix SPI Domain validation

This fixes a thinko of the FAILFAST handling: when we get
a request with FAILFAST set, we still have to evaluate the
PREEMPT flag to decide if this request should be passed through.

Signed-off-by: Hannes Reinecke [EMAIL PROTECTED]

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 13e7e09..9ec1566 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1284,13 +1284,15 @@ int scsi_prep_state_check(struct scsi_device *sdev, 
struct request *req)
/*
 * If the devices is blocked we defer normal commands.
 */
-   if (!(req-cmd_flags  REQ_PREEMPT))
-   ret = BLKPREP_DEFER;
-   /*
-* Return failfast requests immediately
-*/
-   if (req-cmd_flags  REQ_FAILFAST)
-   ret = BLKPREP_KILL;
+   if (!(req-cmd_flags  REQ_PREEMPT)) {
+   /*
+* Return failfast requests immediately
+*/
+   if (req-cmd_flags  REQ_FAILFAST)
+   ret = BLKPREP_KILL;
+   else
+   ret = BLKPREP_DEFER;
+   }
break;
default:
/*


Re: 2.6.24-rc3-mm1: I/O error, system hangs

2007-11-23 Thread James Bottomley

On Fri, 2007-11-23 at 18:52 +0100, Laurent Riffard wrote:
 Le 23.11.2007 12:38, Hannes Reinecke a écrit :
  Hannes Reinecke wrote:
  Laurent Riffard wrote:
  Le 21.11.2007 23:41, Andrew Morton a écrit :
  On Wed, 21 Nov 2007 22:45:22 +0100
  Laurent Riffard [EMAIL PROTECTED] wrote:
 
  Le 21.11.2007 05:45, Andrew Morton a écrit :
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/
  Hello, 
 
  My system hangs shortly after I logged in Gnome desktop. SysRq-W shows
  that a bunch of task are blocked in D state, they seem to wait for
  some I/O completion. I can try to hand-copy some data if requested.
 
  I found these messages in dmesg:
 
  ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 
  EXT3-fs: mounted filesystem with ordered data mode.
  sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT 
  driverbyte=DRIVER_OK,SUGGEST_OK
  end_request: I/O error, dev sda, sector 16460
  ReiserFS: sda7: found reiserfs format 3.6 with standard journal
  ReiserFS: sda7: using ordered data mode
  --
  ReiserFS: sda7: Using r5 hash to sort names
  sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
  driverbyte=DRIVER_OK,SUGGEST_OK
  end_request: I/O error, dev sdb, sector 19632
  sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT 
  driverbyte=DRIVER_OK,SUGGEST_OK
  end_request: I/O error, dev sdb, sector 40037363
  Adding 1048568k swap on /dev/mapper/vglinux1-lvswap.  Priority:-1 
  extents:1 across:1048568k
  lp0: using parport0 (interrupt-driven).
 
  These errors occur *only* with 2.6.24-rc3-mm1, they are 100% 
  reproducible.
  2.6.24-rc3 and 2.6.24-rc2-mm1 are fine.
 
  Maybe something is broken in pata_via driver ?
 
  Could be - 
  libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch
  and 
  pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch
  touch pata_via.c.
  None of the above...
 
  I did a bisection, it spotted git-scsi-misc.patch. 
  I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine.
 
  I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 [SCSI] Do not 
  requeue requests if REQ_FAILFAST is set is the real culprit. The other 
  commits are touching documentation or drivers I don't use. I'll try 
  to revert only this one this evening.
 
 I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 
 does fix the problem.
 
  Hmm. Weird. I'll have a look into it. Apparently I'll be returning an 
  error where
  I shouldn't. Checking ...
 
  Ok, found it. We are blocking even special commands (ie requests with 
  PREEMPT not set)
  when FAILFAST is set. Which is clearly wrong. The attached patch fixes this.
 
 Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O 
 errors.

I think the problem is the way we treat BLOCKED and QUIESCED (the latter
is the state that the domain validation uses and which we cannot kill
fastfail on).  It's definitely wrong to kill fastfail requests when the
state is QUIESCE.

This patch (which is applied on top of Hannes original) separates the
BLOCK and QUIESCE states correctly ... does this fix the problem?

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 13e7e09..a7cf23a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1279,18 +1279,21 @@ int scsi_prep_state_check(struct scsi_device *sdev, 
struct request *req)
rejecting I/O to dead device\n);
ret = BLKPREP_KILL;
break;
-   case SDEV_QUIESCE:
case SDEV_BLOCK:
/*
-* If the devices is blocked we defer normal commands.
-*/
-   if (!(req-cmd_flags  REQ_PREEMPT))
-   ret = BLKPREP_DEFER;
-   /*
 * Return failfast requests immediately
 */
if (req-cmd_flags  REQ_FAILFAST)
ret = BLKPREP_KILL;
+
+   /* fall through */
+
+   case SDEV_QUIESCE:
+   /*
+* If the devices is blocked we defer normal commands.
+*/
+   if (!(req-cmd_flags  REQ_PREEMPT))
+   ret = BLKPREP_DEFER;
break;
default:
/*


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html