Re: Qlogic driver fix in backports induces target code regression.

2017-06-23 Thread Willy Tarreau
Hi,

On Fri, Jun 23, 2017 at 12:45:17PM -0500, Dr. Greg Wettstein wrote:
> Hi, I hope the week is ending well for everyone.
> 
> We just tracked this issue down and I wanted to bring it to the
> attention of Willy and Ben in particular, since it is has a high
> probability of negatively affecting storage consumers of the LTS
> kernel series.
> 
> The following fix from the mainline kernel has been actively
> backported to the LTS kernel series:
> 
> Commit ef86cb2 ("qla2xxx: Mark port lost when we receive an RSCN for it.")
> 
> This fix to the qla2xxx driver core provokes a regression in the
> target code which will probabilistically hang a kernel which is
> processing remote fibre-channel port deletion events.
> 
> The following commit from the mainline kernel fixes this:
> 
> Commit ba9f6f6 ("qla2xxx: Fix hardware lock/unlock issue causing kernel 
> panic.)
> 
> This latter fix, while being tagged for 3.18+, does not appear as if
> it is being picked up for backporting.
> 
> So a situation arises where target users will get hit by this
> regression when they transition past the patch release which has the
> first 'fix' in it.  We just got done checking 3.10.106 and 3.16.44 and
> this situation is present in these most terminal LTS releases.

Indeed, regarding 3.10 it was merged in 3.10.85.

> We are in the process of validating application of the second 'fix' to
> the LTS series but a review of the patch doesn't suggest the potential
> for negative effects.  The same cannot be said for not having the
> fix.. :-)
> 
> Hopefully this is useful information.

Sure it is. Given that the regression was introduced two years ago in
our stable kernels, I guess there's no rush to revert the faulty fix
and that we'll wait for your feedback on testing the second fix above.
Please keep us informed about your progress and whether we need to
backport this patch.

thanks!
Willy


Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination

2017-02-07 Thread Willy Tarreau
On Tue, Feb 07, 2017 at 06:12:34PM +0100, Willy Tarreau wrote:
> On Tue, Feb 07, 2017 at 09:02:51AM -0800, James Bottomley wrote:
> > On Tue, 2017-02-07 at 07:59 +0100, Willy Tarreau wrote:
> > > Hi James,
> > > 
> > > On Mon, Feb 06, 2017 at 10:38:48PM -0800, James Bottomley wrote:
> > > > On Mon, 2017-02-06 at 23:26 +0100, Willy Tarreau wrote:
> > > (...)
> > > > > We don't have the referenced commit above in 3.10 so we should be
> > > > > safe. Additionally I checked that neither 4.4 nor 3.12 have them 
> > > > > either, so that makes me feel confident that we can skip it in 
> > > > > 3.10 as well.
> > > > 
> > > > The original was also racy with respect to multiple commands, so 
> > > > the above fixed the race as well.
> > > 
> > > OK so I tried to backport it to 3.10. I dropped a few parts which 
> > > were addressing this one marked for stable 4.4+ :
> > > 7ff723a ("scsi: mpt3sas: Unblock device after controller reset")
> > > 
> > > And I got the attached patch. All I know is that it builds. I'd 
> > > appreciate it if someone could confirm its validity, in which case
> > > I'll add it.
> > 
> > The two patches apply without fuzz to your tree and the combination is
> > a far better bug fix than the original regardless of whether 7ff723a
> > exists in your tree or not.  By messing with the patches all you do is
> > add the potential for introducing new bugs for no benefit, so why take
> > risk for no upside?
> 
> Just because I'm suggested to apply this fix which is supposed to fix
> a regression brought by 7ff723a which itself is marked to fix 4.4+ only
> and which doesn't apply to 3.10. So now I'm getting confused because
> you say that these patches apply without fuzz but one part definitely
> is rejected and the other one has to be applied by hand. I want not
> to take a risk but I'm faced with these options :
>   - drop all these patches and stay as 3.10.104 is
>   - merge the "secure erase premature" + the the part of the patch
> that supposedly fixes the regression it introduced
>   - merge this fix + 7ff723a + whatever it depends on (not fond of
> it)
> 
> In all cases I don't even have the hardware to validate anything. I'd
> be more tempted with the first two options. If you think I'm taking
> risks by backporting the relevant part of the fix, I'll simply drop
> them all and leave the code as it is now.

So I could backport the fix marked for 4.4+ (7ff723a) and the one
suggested by Sathya (ffb5845). The context was slightly different
but the changes obvious enough to look good. If everyone is OK, I'll
add these two commits. Here are the backports.

Willy
>From acd34b89fe261c88398e26bd305552eb7808 Mon Sep 17 00:00:00 2001
From: Suganath Prabu S <suganath-prabu.subram...@broadcom.com>
Date: Thu, 17 Nov 2016 16:15:58 +0530
Subject: scsi: mpt3sas: Unblock device after controller reset

commit 7ff723ad0f87feba43dda45fdae71206063dd7d4 upstream.

While issuing any ATA passthrough command to firmware the driver will
block the device. But it will unblock the device only if the I/O
completes through the ISR path. If a controller reset occurs before
command completion the device will remain in blocked state.

Make sure we unblock the device following a controller reset if an ATA
passthrough command was queued.

[mkp: clarified patch description]

Cc: <sta...@vger.kernel.org> # v4.4+
Fixes: ac6c2a93bd07 ("mpt3sas: Fix for SATA drive in blocked state, after diag 
reset")
Signed-off-by: Suganath Prabu S <suganath-prabu.subram...@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.peter...@oracle.com>
[wt: adjust context]
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index e414b71..8979403 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3390,6 +3390,11 @@ _scsih_check_volume_delete_events(struct MPT3SAS_ADAPTER 
*ioc,
le16_to_cpu(event_data->VolDevHandle));
 }
 
+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+{
+   return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+}
+
 /**
  * _scsih_flush_running_cmds - completing outstanding commands.
  * @ioc: per adapter object
@@ -3411,6 +3416,9 @@ _scsih_flush_running_cmds(struct MPT3SAS_ADAPTER *ioc)
if (!scmd)
continue;
count++;
+   if (ata_12_16_cmd(scmd))
+   scsi_internal_dev

Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination

2017-02-07 Thread Willy Tarreau
On Tue, Feb 07, 2017 at 09:02:51AM -0800, James Bottomley wrote:
> On Tue, 2017-02-07 at 07:59 +0100, Willy Tarreau wrote:
> > Hi James,
> > 
> > On Mon, Feb 06, 2017 at 10:38:48PM -0800, James Bottomley wrote:
> > > On Mon, 2017-02-06 at 23:26 +0100, Willy Tarreau wrote:
> > (...)
> > > > We don't have the referenced commit above in 3.10 so we should be
> > > > safe. Additionally I checked that neither 4.4 nor 3.12 have them 
> > > > either, so that makes me feel confident that we can skip it in 
> > > > 3.10 as well.
> > > 
> > > The original was also racy with respect to multiple commands, so 
> > > the above fixed the race as well.
> > 
> > OK so I tried to backport it to 3.10. I dropped a few parts which 
> > were addressing this one marked for stable 4.4+ :
> > 7ff723a ("scsi: mpt3sas: Unblock device after controller reset")
> > 
> > And I got the attached patch. All I know is that it builds. I'd 
> > appreciate it if someone could confirm its validity, in which case
> > I'll add it.
> 
> The two patches apply without fuzz to your tree and the combination is
> a far better bug fix than the original regardless of whether 7ff723a
> exists in your tree or not.  By messing with the patches all you do is
> add the potential for introducing new bugs for no benefit, so why take
> risk for no upside?

Just because I'm suggested to apply this fix which is supposed to fix
a regression brought by 7ff723a which itself is marked to fix 4.4+ only
and which doesn't apply to 3.10. So now I'm getting confused because
you say that these patches apply without fuzz but one part definitely
is rejected and the other one has to be applied by hand. I want not
to take a risk but I'm faced with these options :
  - drop all these patches and stay as 3.10.104 is
  - merge the "secure erase premature" + the the part of the patch
that supposedly fixes the regression it introduced
  - merge this fix + 7ff723a + whatever it depends on (not fond of
it)

In all cases I don't even have the hardware to validate anything. I'd
be more tempted with the first two options. If you think I'm taking
risks by backporting the relevant part of the fix, I'll simply drop
them all and leave the code as it is now.

Thanks,
Willy


Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination

2017-02-06 Thread Willy Tarreau
Hi James,

On Mon, Feb 06, 2017 at 10:38:48PM -0800, James Bottomley wrote:
> On Mon, 2017-02-06 at 23:26 +0100, Willy Tarreau wrote:
(...)
> > We don't have the referenced commit above in 3.10 so we should be 
> > safe. Additionally I checked that neither 4.4 nor 3.12 have them 
> > either, so that makes me feel confident that we can skip it in 3.10
> > as well.
> 
> The original was also racy with respect to multiple commands, so the
> above fixed the race as well.

OK so I tried to backport it to 3.10. I dropped a few parts which were
addressing this one marked for stable 4.4+ :
7ff723a ("scsi: mpt3sas: Unblock device after controller reset")

And I got the attached patch. All I know is that it builds. I'd appreciate
it if someone could confirm its validity, in which case I'll add it.

Thanks,
Willy

---

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 994656c..997e13f 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -219,6 +219,7 @@ struct MPT3SAS_TARGET {
  * @eedp_enable: eedp support enable bit
  * @eedp_type: 0(type_1), 1(type_2), 2(type_3)
  * @eedp_block_length: block size
+ * @ata_command_pending: SATL passthrough outstanding for device
  */
 struct MPT3SAS_DEVICE {
struct MPT3SAS_TARGET *sas_target;
@@ -227,6 +228,17 @@ struct MPT3SAS_DEVICE {
u8  configured_lun;
u8  block;
u8  tlr_snoop_check;
+   /*
+* Bug workaround for SATL handling: the mpt2/3sas firmware
+* doesn't return BUSY or TASK_SET_FULL for subsequent
+* commands while a SATL pass through is in operation as the
+* spec requires, it simply does nothing with them until the
+* pass through completes, causing them possibly to timeout if
+* the passthrough is a long executing command (like format or
+* secure erase).  This variable allows us to do the right
+* thing while a SATL command is pending.
+*/
+   unsigned long ata_command_pending;
 };
 
 #define MPT3_CMD_NOT_USED  0x8000  /* free */
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index e414b71..db38f70 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3515,9 +3515,18 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 
ioc_status)
SAM_STAT_CHECK_CONDITION;
 }
 
-static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+static int _scsih_set_satl_pending(struct scsi_cmnd *scmd, bool pending)
 {
-   return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+   struct MPT3SAS_DEVICE *priv = scmd->device->hostdata;
+
+   if (scmd->cmnd[0] != ATA_12 && scmd->cmnd[0] != ATA_16)
+   return 0;
+
+   if (pending)
+   return test_and_set_bit(0, >ata_command_pending);
+
+   clear_bit(0, >ata_command_pending);
+   return 0;
 }
 
 /**
@@ -3547,13 +3556,6 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void 
(*done)(struct scsi_cmnd *))
scsi_print_command(scmd);
 #endif
 
-   /*
-* Lock the device for any subsequent command until command is
-* done.
-*/
-   if (ata_12_16_cmd(scmd))
-   scsi_internal_device_block(scmd->device);
-
scmd->scsi_done = done;
sas_device_priv_data = scmd->device->hostdata;
if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
@@ -3568,6 +3570,19 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void 
(*done)(struct scsi_cmnd *))
return 0;
}
 
+   /*
+* Bug work around for firmware SATL handling.  The loop
+* is based on atomic operations and ensures consistency
+* since we're lockless at this point
+*/
+   do {
+   if (test_bit(0, _device_priv_data->ata_command_pending)) {
+   scmd->result = SAM_STAT_BUSY;
+   scmd->scsi_done(scmd);
+   return 0;
+   }
+   } while (_scsih_set_satl_pending(scmd, true));
+
sas_target_priv_data = sas_device_priv_data->sas_target;
 
/* invalid device handle */
@@ -4057,8 +4072,7 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
if (scmd == NULL)
return 1;
 
-   if (ata_12_16_cmd(scmd))
-   scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+   _scsih_set_satl_pending(scmd, false);
 
mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 



Re: [PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination

2017-02-06 Thread Willy Tarreau
Hi Sathya,

On Mon, Feb 06, 2017 at 09:21:44AM -0700, Sathya Prakash Veerichetty wrote:
> Willy,
> I think this patch had a problem and later modified to a different
> blocking mechanism.  Could you please pull in the latest change for this?

Much appreciated, thanks. I've checked and found the patch you're
talking about :

  commit ffb58456589443ca572221fabbdef3db8483a779
  Author: James Bottomley 
  Date:   Sun Jan 1 09:39:24 2017 -0800

scsi: mpt3sas: fix hang on ata passthrough commands

mpt3sas has a firmware failure where it can only handle one pass through
ATA command at a time.  If another comes in, contrary to the SAT
standard, it will hang until the first one completes (causing long
commands like secure erase to timeout).  The original fix was to block
the device when an ATA command came in, but this caused a regression
with

commit 669f044170d8933c3d66d231b69ea97cb8447338
Author: Bart Van Assche 
Date:   Tue Nov 22 16:17:13 2016 -0800

scsi: srp_transport: Move queuecommand() wait code to SCSI core

So fix the original fix of the secure erase timeout by properly
returning SAM_STAT_BUSY like the SAT recommends.  The original patch
also had a concurrency problem since scsih_qcmd is lockless at that
point (this is fixed by using atomic bitops to set and test the flag).

[mkp: addressed feedback wrt. test_bit and fixed whitespace]

Fixes: 18f6084a989ba1b (mpt3sas: Fix secure erase premature termination)
Signed-off-by: James Bottomley 
Acked-by: Sreekanth Reddy 
Reviewed-by: Christoph Hellwig 
Reported-by: Ingo Molnar 
Tested-by: Ingo Molnar 
Signed-off-by: Martin K. Petersen 

We don't have the referenced commit above in 3.10 so we should be safe.
Additionally I checked that neither 4.4 nor 3.12 have them either, so
that makes me feel confident that we can skip it in 3.10 as well.

Thanks!
Willy


[PATCH 3.10 141/319] scsi: mpt3sas: Fix secure erase premature termination

2017-02-05 Thread Willy Tarreau
From: Andrey Grodzovsky <andrey2...@gmail.com>

commit 18f6084a989ba1b38702f9af37a2e4049a924be6 upstream.

This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
secure erase. Due to the very long time the operation takes, commands
issued during the erase will time out and will trigger execution of the
abort hook. Even though the abort hook is called for the specific
command which timed out, this leads to entire device halt
(scsi_state terminated) and premature termination of the secure erase.

Set device state to busy while ATA passthrough commands are in progress.

[mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]

Signed-off-by: Andrey Grodzovsky <andrey2...@gmail.com>
Acked-by: Sreekanth Reddy <sreekanth.re...@broadcom.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: Sathya Prakash <sathya.prak...@broadcom.com>
Cc: Chaitra P B <chaitra.basa...@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subram...@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.re...@broadcom.com>
Cc: Hannes Reinecke <h...@suse.de>
Signed-off-by: Martin K. Petersen <martin.peter...@oracle.com>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index f8c4b85..e414b71 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3515,6 +3515,10 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 
ioc_status)
SAM_STAT_CHECK_CONDITION;
 }
 
+static inline bool ata_12_16_cmd(struct scsi_cmnd *scmd)
+{
+   return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);
+}
 
 /**
  * _scsih_qcmd_lck - main scsi request entry point
@@ -3543,6 +3547,13 @@ _scsih_qcmd_lck(struct scsi_cmnd *scmd, void 
(*done)(struct scsi_cmnd *))
scsi_print_command(scmd);
 #endif
 
+   /*
+* Lock the device for any subsequent command until command is
+* done.
+*/
+   if (ata_12_16_cmd(scmd))
+   scsi_internal_device_block(scmd->device);
+
scmd->scsi_done = done;
sas_device_priv_data = scmd->device->hostdata;
if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
@@ -4046,6 +4057,9 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
if (scmd == NULL)
return 1;
 
+   if (ata_12_16_cmd(scmd))
+   scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
if (mpi_reply == NULL) {
-- 
2.8.0.rc2.1.gbe9624a



Re: [patch] ses: tighten range checks in ses_intf_add()

2015-10-20 Thread Willy Tarreau
Hi Dan,

On Mon, Oct 19, 2015 at 04:48:20PM +0300, Dan Carpenter wrote:
> We test that "type_ptr" is within the buffer but then we read from
> "type_ptr[3]" so we could be reading beyond the end of the buffer.
> 
> Reported-by: "Berry Cheng ??(??)" 
> Signed-off-by: Dan Carpenter 
> ---
> This isn't a complete fix because we still need more range checking in
> all the other places which use type_ptr like ses_get_page2_descriptor().
> We record len as page1_len but we don't use it anywhere...
> 
> I wonder if someone knew the expected format we could make reject too
> short lengths earlier.
> 
> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
> index dcb0d76..39f69b0 100644
> --- a/drivers/scsi/ses.c
> +++ b/drivers/scsi/ses.c
> @@ -641,7 +641,7 @@ static int ses_intf_add(struct device *cdev,
>   /* begin at the enclosure descriptor */
>   type_ptr = buf + 8;
>   /* skip all the enclosure descriptors */
> - for (i = 0; i < num_enclosures && type_ptr < buf + len; i++) {
> + for (i = 0; i < num_enclosures && type_ptr + 4 < buf + len; i++) {
>   types += type_ptr[2];
>   type_ptr += type_ptr[3] + 4;


why "type_ptr + 4 < buf + len" here ? You're using type_ptr[3] only,
so it's either "type_ptr + 4 <= buf + len" or "type_ptr + 3 < buf + len".

> @@ -649,7 +649,7 @@ static int ses_intf_add(struct device *cdev,
>   ses_dev->page1_types = type_ptr;
>   ses_dev->page1_num_types = types;
>  
> - for (i = 0; i < types && type_ptr < buf + len; i++, type_ptr += 4) {
> + for (i = 0; i < types && type_ptr + 2 < buf + len; i++, type_ptr += 4) {
>   if (type_ptr[0] == ENCLOSURE_COMPONENT_DEVICE ||
>   type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
>   components += type_ptr[1];

Same here where I'd expect "type_ptr + 1 < buf + len"

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/14] parport: return value of attach and parport_register_driver

2015-04-08 Thread Willy Tarreau
On Wed, Apr 08, 2015 at 03:27:37PM +0300, Dan Carpenter wrote:
 On Wed, Apr 08, 2015 at 05:20:10PM +0530, Sudip Mukherjee wrote:
  On Wed, Apr 08, 2015 at 02:38:32PM +0300, Dan Carpenter wrote:
   1) We can't apply this patch on its own so this way of breaking up the
   patches doesn't work.
  yes, if the first patch is reverted for any reason all the others need
  to be reverted also. so then everything in one single patch?
 
 The problem is that patch 1/1 breaks the build.  The rule is that we
 should be able to apply part of a patch series and nothing breaks.  If
 we apply the patch series out of order than things break that's our
 problem, yes.  But if we apply only 1/1 and it breaks, that's a problem
 with the series.

Yep, keep in mind that git bisect will randomly land in the middle of
any set of patches during a debugging session and it could very well land
on this one. If it breaks, that's a real pain for the people trying to
bisect their bug because suddenly they have to deal with a second bug
totally different from theirs.

Regards,
Willy

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 3680] sym53c8xx_2 SMP deadlock on driver load

2007-10-17 Thread Willy Tarreau
On Wed, Oct 17, 2007 at 10:53:06AM -0400, Tony Battersby wrote:
  So we should unconditionally drop the lock (and re-enable
  interrupts) and re-acquire it.
 
 After looking at it carefully, this is true of pci_map_mem, but not
 pci_unmap_mem.  pci_unmap_mem can be called from both -detect and
 -release.  io_request_lock is held in -detect but not in -release.
 So, your patch locks up the system on module unload.
 
 I have put together and tested a new patch which does it correctly,
 while still trying to make only minimal changes.
 If you approve, this can go into the next 2.4.x release.

(...)

 -static void __init pci_unmap_mem(u_long vaddr, u_long size)
 -{
 - if (vaddr)
 +static void __init pci_unmap_mem(u_long vaddr,
 + u_long size,
 + int holding_io_request_lock)

This is marked __init, and pci_unmap_mem() is called from
sym_free_resources(), which in turn is called from sym_detach(),
called from sym53c8xx_release() when unloading module. So the section
may not be there anymore upon unload. I wonder how this can work right
now. I'm surely missing something :-/

Willy

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 3680] sym53c8xx_2 SMP deadlock on driver load

2007-10-17 Thread Willy Tarreau
On Wed, Oct 17, 2007 at 11:44:08AM -0400, Tony Battersby wrote:
 In 2.4, include/linux/init.h has the following:
 
 #ifndef MODULE
 #define __init  __attribute__ ((__section__ (.text.init)))
 #else
 #define __init
 #endif
 
 So __init has an effect only if it is built-in.

Ah yes, you're right.

 Apparently 2.6 also
 discards __init sections in modules after loading, but 2.4 doesn't.  Of
 course, it is still a bug.
 
 Do you want me to redo the patch with __init taken out?

It would be better. As a general rule of thumb, an __init function should
not be called from a non __init, otherwise it is very hard to track the
call path.

Thanks,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 3680] sym53c8xx_2 SMP deadlock on driver load

2007-10-17 Thread Willy Tarreau
On Wed, Oct 17, 2007 at 10:27:47AM -0600, Matthew Wilcox wrote:
 On Wed, Oct 17, 2007 at 10:53:06AM -0400, Tony Battersby wrote:
  After looking at it carefully, this is true of pci_map_mem, but not
  pci_unmap_mem.  pci_unmap_mem can be called from both -detect and
  -release.  io_request_lock is held in -detect but not in -release.
  So, your patch locks up the system on module unload.
 
 My mistake.  I should have checked the iorl was held over -release too.
 
 This is a pretty ugly patch, but I think the three alternatives are
 worse:
 
  - Drop the iorl at the beginning of -detect and reacquire it at exit.
We don't know what else the iorl might be protecting.  Probably
nothing, but I don't want to audit it.
  - Convert sym2 back to old error handling so the midlayer doesn't
acquire the iorl for us in -detect.
  - Acquire the iorl in -release.  We might deadlock on something.
 
 So, sure, apply this patch.

Matthew,

Sincere thanks for your help and review. I'll apply this patch in
2.4.36-pre2 and in next 2.4.35.X. An ugly fix is clearly better than
a massive change at this stage.

Natalie, if you want, you can close the bug right now, I've queued the
patch for both branches.

Best regards,
Willy

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html