Re: [PATCH 3/3] NCR5380: Check for chip presence in NCR5380_init()
On Sun, 30 Oct 2016, Ondrej Zary wrote: > Read back MODE_REG after writing it in NCR5380_init() to check if the > chip is really there. > > This prevents hang when incorrect I/O address was specified by user. Do you know whereabouts in the driver the hang happens? Maybe there is a robustness issue there. Card type detection (and vacant slot detection) is a good idea but I'm not sure how we can detect this chip reliably. -- > > Signed-off-by: Ondrej Zary> --- > drivers/scsi/NCR5380.c |5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c > index 01c0027..ce3156d 100644 > --- a/drivers/scsi/NCR5380.c > +++ b/drivers/scsi/NCR5380.c > @@ -495,6 +495,11 @@ static int NCR5380_init(struct Scsi_Host *instance, int > flags) > > NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE); > NCR5380_write(MODE_REG, MR_BASE); > + /* check if the chip is really there */ > + if (NCR5380_read(MODE_REG) != MR_BASE) { > + NCR5380_exit(instance); > + return -ENODEV; > + } > NCR5380_write(TARGET_COMMAND_REG, 0); > NCR5380_write(SELECT_ENABLE_REG, 0); > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] NCR5380: Check for chip presence in NCR5380_init()
On Sun, 30 Oct 2016, Ondrej Zary wrote: > Read back MODE_REG after writing it in NCR5380_init() to check if the > chip is really there. > > This prevents hang when incorrect I/O address was specified by user. > > Signed-off-by: Ondrej Zary> --- > drivers/scsi/NCR5380.c |5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c > index 01c0027..ce3156d 100644 > --- a/drivers/scsi/NCR5380.c > +++ b/drivers/scsi/NCR5380.c > @@ -495,6 +495,11 @@ static int NCR5380_init(struct Scsi_Host *instance, int > flags) > > NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE); > NCR5380_write(MODE_REG, MR_BASE); > + /* check if the chip is really there */ > + if (NCR5380_read(MODE_REG) != MR_BASE) { > + NCR5380_exit(instance); > + return -ENODEV; > + } This doesn't belong in the core driver. Only the 5380 ISA drivers have configurable base addresses. Also, MR_BASE == 0, so that test is likely to be ineffectual anyway. This patch doesn't really add any value AFAICT. -- > NCR5380_write(TARGET_COMMAND_REG, 0); > NCR5380_write(SELECT_ENABLE_REG, 0); > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] g_NCR5380: Test the IRQ before accepting it
On Sun, 30 Oct 2016, Ondrej Zary wrote: > Trigger an IRQ first with a test IRQ handler to find out if it really > works. Disable the IRQ if not. > > This prevents hang when incorrect IRQ was specified by user. > > Signed-off-by: Ondrej Zary> --- > drivers/scsi/g_NCR5380.c | 32 ++-- > 1 file changed, 30 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c > index 3790ed5..261e168 100644 > --- a/drivers/scsi/g_NCR5380.c > +++ b/drivers/scsi/g_NCR5380.c > @@ -67,6 +67,14 @@ > MODULE_ALIAS("g_NCR5380_mmio"); > MODULE_LICENSE("GPL"); > > +static bool irq_working; > + > +static irqreturn_t test_irq(int irq, void *dev_id) > +{ > + irq_working = true; > + return IRQ_HANDLED; > +} > + > /* > * Configure I/O address of 53C400A or DTC436 by writing magic numbers > * to ports 0x779 and 0x379. > @@ -275,10 +283,30 @@ static int generic_NCR5380_init_one(struct > scsi_host_template *tpnt, > /* set IRQ for HP C2502 */ > if (board == BOARD_HP_C2502) > magic_configure(port_idx, instance->irq, magic); > - if (request_irq(instance->irq, generic_NCR5380_intr, > - 0, "NCR5380", instance)) { > + /* test if the IRQ is working */ > + irq_working = false; > + if (request_irq(instance->irq, test_irq, > + 0, "NCR5380-irqtest", NULL)) { > printk(KERN_WARNING "scsi%d : IRQ%d not free, > interrupts disabled\n", instance->host_no, instance->irq); > instance->irq = NO_IRQ; > + } else { > + NCR5380_trigger_irq(instance); > + NCR5380_read(RESET_PARITY_INTERRUPT_REG); > + free_irq(instance->irq, NULL); > + if (irq_working) { > + if (request_irq(instance->irq, > + generic_NCR5380_intr, 0, > + "NCR5380", instance)) { > + printk(KERN_WARNING "scsi%d : IRQ%d not > free, interrupts disabled\n", > +instance->host_no, > +instance->irq); > + instance->irq = NO_IRQ; > + } > + } else { > + printk(KERN_WARNING "scsi%d : IRQ%d not > working, interrupts disabled\n", > +instance->host_no, instance->irq); > + instance->irq = NO_IRQ; > + } > } > } > > If the user omits to specify an irq, you can just default to IRQ_AUTO. This might result in NO_IRQ, which gives the same result as this patch. And when the user does specify an IRQ, we should trust them. So this compexity doesn't add any value AFAICT. Thanks but no thanks. -- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] NCR5380: Use probe_irq_*() for IRQ probing
On Sun, 30 Oct 2016, Ondrej Zary wrote: > Use standard probe_irq_on() and probe_irq_off() functions instead of own > implementation. Thanks for doing this. > This prevents warning messages like this in the kernel log: > genirq: Flags mismatch irq 1. (NCR-probe) vs. 0080 (i8042) > > Move the IRQ trigger code to a separate function so it can be used for > other purposes (testing if the IRQ works). > > Also add missing IRQ reset after probe. > > Signed-off-by: Ondrej Zary> --- > drivers/scsi/NCR5380.c | 72 > +- > drivers/scsi/NCR5380.h |3 +- > drivers/scsi/g_NCR5380.c |2 +- > 3 files changed, 29 insertions(+), 48 deletions(-) > > diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c > index d849ffa..01c0027 100644 > --- a/drivers/scsi/NCR5380.c > +++ b/drivers/scsi/NCR5380.c > @@ -351,50 +351,13 @@ static void NCR5380_print_phase(struct Scsi_Host > *instance) > } > #endif > > - > -static int probe_irq; > - > -/** > - * probe_intr- helper for IRQ autoprobe > - * @irq: interrupt number > - * @dev_id: unused > - * @regs: unused > - * > - * Set a flag to indicate the IRQ in question was received. This is > - * used by the IRQ probe code. > - */ > - > -static irqreturn_t probe_intr(int irq, void *dev_id) > -{ > - probe_irq = irq; > - return IRQ_HANDLED; > -} > - > -/** > - * NCR5380_probe_irq - find the IRQ of an NCR5380 > - * @instance: NCR5380 controller > - * @possible: bitmask of ISA IRQ lines > - * > - * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ > - * and then looking to see what interrupt actually turned up. > - */ > - > -static int __maybe_unused NCR5380_probe_irq(struct Scsi_Host *instance, > - int possible) > +static void NCR5380_trigger_irq(struct Scsi_Host *instance) > { > struct NCR5380_hostdata *hostdata = shost_priv(instance); > - unsigned long timeout; > - int trying_irqs, i, mask; > - > - for (trying_irqs = 0, i = 1, mask = 2; i < 16; ++i, mask <<= 1) > - if ((mask & possible) && (request_irq(i, _intr, 0, > "NCR-probe", NULL) == 0)) > - trying_irqs |= mask; > - > - timeout = jiffies + msecs_to_jiffies(250); > - probe_irq = NO_IRQ; > + unsigned long timeout = jiffies + msecs_to_jiffies(2500); > > /* > - * A interrupt is triggered whenever BSY = false, SEL = true > + * An interrupt is triggered whenever BSY = false, SEL = true >* and a bit set in the SELECT_ENABLE_REG is asserted on the >* SCSI bus. >* > @@ -402,23 +365,40 @@ static int __maybe_unused NCR5380_probe_irq(struct > Scsi_Host *instance, >* (I/O, C/D, and MSG) match those in the TCR, so we must reset that >* to zero. >*/ > - > NCR5380_write(TARGET_COMMAND_REG, 0); > NCR5380_write(SELECT_ENABLE_REG, hostdata->id_mask); > NCR5380_write(OUTPUT_DATA_REG, hostdata->id_mask); > NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_DATA | > ICR_ASSERT_SEL); > > - while (probe_irq == NO_IRQ && time_before(jiffies, timeout)) > + while (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_IRQ) && > +time_before(jiffies, timeout)) > schedule_timeout_uninterruptible(1); You don't need a 2.5 second timeout here. To raise the interrupt the chip should not need more than a bus settle delay (400ns). Calling msleep(1) should be plenty. Better still, please drop the loop, the delay and the BASR_IRQ test here, and if need be, put the delay in the caller. This routine discards the result of that sequence anyway. > > NCR5380_write(SELECT_ENABLE_REG, 0); > NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE); > +} > > - for (i = 1, mask = 2; i < 16; ++i, mask <<= 1) > - if (trying_irqs & mask) > - free_irq(i, NULL); > +/** > + * NCR5380_probe_irq - find the IRQ of an NCR5380 > + * @instance: NCR5380 controller > + * > + * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ > + * and then looking to see what interrupt actually turned up. > + */ > + > +static int NCR5380_probe_irq(struct Scsi_Host *instance) > +{ > + struct NCR5380_hostdata *hostdata = shost_priv(instance); > + int irqs, irq; > + Please rename irqs to irq_mask. Other drivers like tty/cyclades.c begin with, /* forget possible initially masked and pending IRQ */ irq = probe_irq_off(probe_irq_on()); Perhaps we need that here too? > + irqs = probe_irq_on(); You should clear any pending interrupt on the chip before you try to trigger another one. > + NCR5380_trigger_irq(instance); Please do the msleep(1) here. > + irq = probe_irq_off(irqs); > + NCR5380_read(RESET_PARITY_INTERRUPT_REG); > + if (irq < 0) > + irq = 0; Please use NO_IRQ instead of 0. > > -
RE: iscsi target kernel error log support
Hi, Thanks for reply:) Actually when configure second target, I execute targetcli /iscsi create iqn.2016-08.nokia.lab:omu-0 which will automatically create the 0.0.0.0 3260 listening socket for me. But I think we do not need it at all, because we will delete it later, after that configure own IP(like below 10.180.180.222 ), so I am wondering whether is there any other iscsi target configuration can avoid this unnecessary issue. " targetcli /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/portals delete 0.0.0.0 3260 targetcli /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/portals create 10.180.180.222 Using default IP port 3260 " Thanks again. Brs, Jin Ke -Original Message- From: Nicholas A. Bellinger [mailto:n...@linux-iscsi.org] Sent: Sunday, October 30, 2016 7:58 AM To: Jin, Ke (Nokia - CN/Hangzhou)Cc: target-de...@vger.kernel.org; linux-scsi@vger.kernel.org; Xiong, Li (Nokia - CN/Hangzhou) ; Tom Herbert ; Andy Grover Subject: Re: iscsi target kernel error log support Hi Jin, Adding Tom CC' for SO_REUSEPORT and Andy CC' for INADDR_ANY + targetcli. On Tue, 2016-10-18 at 01:25 +, Jin, Ke (Nokia - CN/Hangzhou) wrote: > Hi, > Sorry to disturb you. > > I am using targetcli to configure the iscsi target on my linux host. I > find one issue, could you kindly review it and give some comments to > me. > > Issue: (linux kernel 4.4 stable) > > Configure first target: > 10/14/16 17:07:00.538417942: targetcli /iscsi create > iqn.2016-08.nokia.lab:chu-0 > Created target iqn.2016-08.nokia.lab:chu-0. > Created TPG 1. > Global pref auto_add_default_portal=true > Created default portal listening on all IPs (0.0.0.0), port 3260. > 10/14/16 17:07:00.650688229: targetcli /backstores/block create iblock_0 > dev=/dev/sda > Created block storage object iblock_0 using /dev/sda. > 10/14/16 17:07:00.772435404: targetcli > /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/luns create > /backstores/block/iblock_0 1 > Created LUN 1. > 10/14/16 17:07:00.894355883: targetcli > /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/portals delete 0.0.0.0 3260 > Deleted network portal 0.0.0.0:3260 > 10/14/16 17:07:01.012410786: targetcli > /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/portals create 10.180.180.111 > Using default IP port 3260 > Created network portal 10.180.180.111:3260. > 10/14/16 17:07:01.130478681: targetcli > /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/acls create > iqn.2016-08.nokia.lab:initiator.chu-0-0 > Created Node ACL for iqn.2016-08.nokia.lab:initiator.chu-0-0 > Created mapped LUN 1. > > Then I configure second target on same host. > targetcli /iscsi create iqn.2016-08.nokia.lab:omu-0 > Created target iqn.2016-08.nokia.lab:omu. > Created TPG 1. > Default portal not created, TPGs within a target cannot share ip:port. > Then I check the kernel log with dmesg, there is one error log > “kernel_bind() failed: -98”. > > So AFAIK, the failure of INADDR_ANY portal binding when !INADDR_ANY portals exist was the original expected behavior. > > Then I begin to check the iscsi kernel module code. linux-stable > \drivers\target\iscsi\ iscsi_target_login.c > int iscsit_setup_np() > { > ret = kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, > (char *), sizeof(opt)); > } > I think SO_REUSEADDR is not enough in this case (there have two > targets configured), when targetcli /iscsi create > iqn.2016-08.nokia.lab:omu-0, it will create listening socket with > 0,0,0,0:3260, because it already have 10.180.180.111:3260 listening > sockets there in first target, so this cause the kernel_bind() failed: > -98”. So I add SO_REUSEPORT also. It works. > int iscsit_setup_np() > { > ret = kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, > (char *), sizeof(opt)); > ret = kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, > (char *), sizeof(opt)); > } > > How do you think, do you think this is the kernel bug or just my > configuration issue, or you already have this patch? > At iscsi-target application level, we're using a iscsi_np kthread listener per ipv4 / ipv6 addr + port. (eg: one listener per iscsi network portal) The network portal can be shared across N target endpoints in /sys/kernel/config/target/iscsi/$IQN/$TPGT/np/*, but the actual listening + accept for each ipv4 / ipv6 addr + port is still done from a single kthread context. So AFAICT for your use-case, adding SO_REUSEPORT to allow concurrent INADDR_ANY + !INADDR_ANY network portals should be OK, at least from an iscsi-target perspective. That is, iscsi-target would consider each as a separate network portal, allowing INADDR_ANY + !INADDR_ANY accepts to be serviced across multiple iscsi_np kthreads. There should not be an inherent issue allowing this, but it would useful to understand if there are any other possible ramifications before enabling by default.
[PATCH 3/3] NCR5380: Check for chip presence in NCR5380_init()
Read back MODE_REG after writing it in NCR5380_init() to check if the chip is really there. This prevents hang when incorrect I/O address was specified by user. Signed-off-by: Ondrej Zary--- drivers/scsi/NCR5380.c |5 + 1 file changed, 5 insertions(+) diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c index 01c0027..ce3156d 100644 --- a/drivers/scsi/NCR5380.c +++ b/drivers/scsi/NCR5380.c @@ -495,6 +495,11 @@ static int NCR5380_init(struct Scsi_Host *instance, int flags) NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE); NCR5380_write(MODE_REG, MR_BASE); + /* check if the chip is really there */ + if (NCR5380_read(MODE_REG) != MR_BASE) { + NCR5380_exit(instance); + return -ENODEV; + } NCR5380_write(TARGET_COMMAND_REG, 0); NCR5380_write(SELECT_ENABLE_REG, 0); -- Ondrej Zary -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] g_NCR5380: Test the IRQ before accepting it
Trigger an IRQ first with a test IRQ handler to find out if it really works. Disable the IRQ if not. This prevents hang when incorrect IRQ was specified by user. Signed-off-by: Ondrej Zary--- drivers/scsi/g_NCR5380.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c index 3790ed5..261e168 100644 --- a/drivers/scsi/g_NCR5380.c +++ b/drivers/scsi/g_NCR5380.c @@ -67,6 +67,14 @@ MODULE_ALIAS("g_NCR5380_mmio"); MODULE_LICENSE("GPL"); +static bool irq_working; + +static irqreturn_t test_irq(int irq, void *dev_id) +{ + irq_working = true; + return IRQ_HANDLED; +} + /* * Configure I/O address of 53C400A or DTC436 by writing magic numbers * to ports 0x779 and 0x379. @@ -275,10 +283,30 @@ static int generic_NCR5380_init_one(struct scsi_host_template *tpnt, /* set IRQ for HP C2502 */ if (board == BOARD_HP_C2502) magic_configure(port_idx, instance->irq, magic); - if (request_irq(instance->irq, generic_NCR5380_intr, - 0, "NCR5380", instance)) { + /* test if the IRQ is working */ + irq_working = false; + if (request_irq(instance->irq, test_irq, + 0, "NCR5380-irqtest", NULL)) { printk(KERN_WARNING "scsi%d : IRQ%d not free, interrupts disabled\n", instance->host_no, instance->irq); instance->irq = NO_IRQ; + } else { + NCR5380_trigger_irq(instance); + NCR5380_read(RESET_PARITY_INTERRUPT_REG); + free_irq(instance->irq, NULL); + if (irq_working) { + if (request_irq(instance->irq, + generic_NCR5380_intr, 0, + "NCR5380", instance)) { + printk(KERN_WARNING "scsi%d : IRQ%d not free, interrupts disabled\n", + instance->host_no, + instance->irq); + instance->irq = NO_IRQ; + } + } else { + printk(KERN_WARNING "scsi%d : IRQ%d not working, interrupts disabled\n", + instance->host_no, instance->irq); + instance->irq = NO_IRQ; + } } } -- Ondrej Zary -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] NCR5380: Use probe_irq_*() for IRQ probing
Use standard probe_irq_on() and probe_irq_off() functions instead of own implementation. This prevents warning messages like this in the kernel log: genirq: Flags mismatch irq 1. (NCR-probe) vs. 0080 (i8042) Move the IRQ trigger code to a separate function so it can be used for other purposes (testing if the IRQ works). Also add missing IRQ reset after probe. Signed-off-by: Ondrej Zary--- drivers/scsi/NCR5380.c | 72 +- drivers/scsi/NCR5380.h |3 +- drivers/scsi/g_NCR5380.c |2 +- 3 files changed, 29 insertions(+), 48 deletions(-) diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c index d849ffa..01c0027 100644 --- a/drivers/scsi/NCR5380.c +++ b/drivers/scsi/NCR5380.c @@ -351,50 +351,13 @@ static void NCR5380_print_phase(struct Scsi_Host *instance) } #endif - -static int probe_irq; - -/** - * probe_intr - helper for IRQ autoprobe - * @irq: interrupt number - * @dev_id: unused - * @regs: unused - * - * Set a flag to indicate the IRQ in question was received. This is - * used by the IRQ probe code. - */ - -static irqreturn_t probe_intr(int irq, void *dev_id) -{ - probe_irq = irq; - return IRQ_HANDLED; -} - -/** - * NCR5380_probe_irq - find the IRQ of an NCR5380 - * @instance: NCR5380 controller - * @possible: bitmask of ISA IRQ lines - * - * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ - * and then looking to see what interrupt actually turned up. - */ - -static int __maybe_unused NCR5380_probe_irq(struct Scsi_Host *instance, - int possible) +static void NCR5380_trigger_irq(struct Scsi_Host *instance) { struct NCR5380_hostdata *hostdata = shost_priv(instance); - unsigned long timeout; - int trying_irqs, i, mask; - - for (trying_irqs = 0, i = 1, mask = 2; i < 16; ++i, mask <<= 1) - if ((mask & possible) && (request_irq(i, _intr, 0, "NCR-probe", NULL) == 0)) - trying_irqs |= mask; - - timeout = jiffies + msecs_to_jiffies(250); - probe_irq = NO_IRQ; + unsigned long timeout = jiffies + msecs_to_jiffies(2500); /* -* A interrupt is triggered whenever BSY = false, SEL = true +* An interrupt is triggered whenever BSY = false, SEL = true * and a bit set in the SELECT_ENABLE_REG is asserted on the * SCSI bus. * @@ -402,23 +365,40 @@ static int __maybe_unused NCR5380_probe_irq(struct Scsi_Host *instance, * (I/O, C/D, and MSG) match those in the TCR, so we must reset that * to zero. */ - NCR5380_write(TARGET_COMMAND_REG, 0); NCR5380_write(SELECT_ENABLE_REG, hostdata->id_mask); NCR5380_write(OUTPUT_DATA_REG, hostdata->id_mask); NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_DATA | ICR_ASSERT_SEL); - while (probe_irq == NO_IRQ && time_before(jiffies, timeout)) + while (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_IRQ) && + time_before(jiffies, timeout)) schedule_timeout_uninterruptible(1); NCR5380_write(SELECT_ENABLE_REG, 0); NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE); +} - for (i = 1, mask = 2; i < 16; ++i, mask <<= 1) - if (trying_irqs & mask) - free_irq(i, NULL); +/** + * NCR5380_probe_irq - find the IRQ of an NCR5380 + * @instance: NCR5380 controller + * + * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ + * and then looking to see what interrupt actually turned up. + */ + +static int NCR5380_probe_irq(struct Scsi_Host *instance) +{ + struct NCR5380_hostdata *hostdata = shost_priv(instance); + int irqs, irq; + + irqs = probe_irq_on(); + NCR5380_trigger_irq(instance); + irq = probe_irq_off(irqs); + NCR5380_read(RESET_PARITY_INTERRUPT_REG); + if (irq < 0) + irq = 0; - return probe_irq; + return irq; } /** diff --git a/drivers/scsi/NCR5380.h b/drivers/scsi/NCR5380.h index 3c6ce54..b2fca52 100644 --- a/drivers/scsi/NCR5380.h +++ b/drivers/scsi/NCR5380.h @@ -290,7 +290,8 @@ static inline struct scsi_cmnd *NCR5380_to_scmd(struct NCR5380_cmd *ncmd_ptr) #define NCR5380_dprint_phase(flg, arg) do {} while (0) #endif -static int NCR5380_probe_irq(struct Scsi_Host *instance, int possible); +static void NCR5380_trigger_irq(struct Scsi_Host *instance); +static int NCR5380_probe_irq(struct Scsi_Host *instance); static int NCR5380_init(struct Scsi_Host *instance, int flags); static int NCR5380_maybe_reset_bus(struct Scsi_Host *); static void NCR5380_exit(struct Scsi_Host *instance); diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c index 7299ad9..3790ed5 100644 --- a/drivers/scsi/g_NCR5380.c +++ b/drivers/scsi/g_NCR5380.c @@ -265,7 +265,7 @@ static int generic_NCR5380_init_one(struct
Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock
On Sun, 2016-10-30 at 19:22 +, Bart Van Assche wrote: > On 10/28/16 19:08, James Bottomley wrote: > > This is a deadlock caused by an inversion issue in kernfs (suicide > > vs > > non-suicide removes); so fixing it in SCSI alone really isn't > > appropriate. I count at least five other subsystems all using this > > mechanism, so they'll all be similarly affected. It looks to be > > fairly > > simply fixable inside kernfs, so please fix it that way. > > Hello James, > > Can you clarify this further? To me this looks like the result of how > the SCSI core works rather than an issue in the kernfs layer. I'm at a bit of a loss, the problem looks clear from the original trace, so I'm not really sure what's not clear to you. The inversion is between the scan mutex and s_active which is the rather fanciful name Tejun gave to the hand rolled mutex in kernfs_node. The reason for the inversion is that s_active is taken when you open a sysfs file, including the delete one. There's a special suidice path to allow that file to be deleted while something else holds the lock. However, if the delete path also takes any lock, and there's a way to get into delete not via writing to sysfs (which is pretty much universally true) then you get an inversion because kernfs_node mutex is also taken when the file is removed, which is why it's not specific to scsi. Since you press the issue, I've got to say I'm not a huge fan of trying to escape from a lock inversion by making some path asynchronous because it usually leads to even more problems on down the road. If there's some problem with the generic fix, there is a way of fixing this in SCSI without introducing asynchronicity. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need some pointers to debug a target hang
On Sat, Oct 29, 2016 at 03:53:25PM -0700, Nicholas A. Bellinger wrote: > Hi Johannes & Zhu, > > On Tue, 2016-10-18 at 23:29 -0700, Nicholas A. Bellinger wrote: > > On Tue, 2016-10-18 at 19:35 +0200, Johannes Thumshirn wrote: > > > On Tue, Oct 18, 2016 at 09:01:34AM +0200, Johannes Thumshirn wrote: > > > > > > [...] > > > > > > > > > > > > > This is likely the missing SCF_ACK_KREF assignment in >= v4.1.y: > > > > > > > > > > http://www.spinics.net/lists/target-devel/msg13530.html > > > > > > Sorry to disappoint you but it didn't fix my issue. Is there any debug > > > data I > > > can give you, or do you have an advice where I could start looking? > > > > > > I have to admit I only tested the patch on our downstream kernel and not > > > the > > > upstream kernel, I'll repeat the tests on 4.8 final tomorrow. > > > > > > > Is it possible to generate vmcore to have a look in crash + gdb..? > > > > > > Just curious if you've been able to verify the above patch for > v4.8.y iscsi-target ports on your specific config(s)..? > > To confirm, using v4.1.y + patch I've not run into any further > se_cmd->cmd_kref leaks and/or hung tasks with iscsi-target ports due to > high backend device I/O latency, resulting in host timeouts + ABORT_TASK > + session reinstatement to occur while waiting for outstanding se_cmd > backend I/O to complete. Unfortunately v4.8.5 which has above patch included (I double checked to be sure) shows the same effects. Here's my "reproducer": VM1 (target) VM2 (Initiator) === dd if=/dev/sda of=/dev/null bs=1 (given sda is the iscsi LUN) Wait a little bit (~10 secs) dmsetup suspend dm-0 watch -n1 'ps aux |\ grep -E "\[iscsi" | grep -v grep' I'll build a kernel with all target stuff builtin and full debug symbols. Let's see if I can find out something. Thanks, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock
On 10/28/16 19:08, James Bottomley wrote: > This is a deadlock caused by an inversion issue in kernfs (suicide vs > non-suicide removes); so fixing it in SCSI alone really isn't > appropriate. I count at least five other subsystems all using this > mechanism, so they'll all be similarly affected. It looks to be fairly > simply fixable inside kernfs, so please fix it that way. Hello James, Can you clarify this further? To me this looks like the result of how the SCSI core works rather than an issue in the kernfs layer. My interpretation of the deadlock report produced by the lockdep code is as follows: * The SCSI scanning code holds scan_mutex while creating sysfs attributes for a SCSI device. In this case scan_mutex is the outer mutex and s_active the inner locking object. * scsi_remove_host() holds scan_mutex while removing sysfs attributes. Also in this case scan_mutex is the outer mutex and s_active the inner locking object. * During self-removal (sysfs_remove_file_self() being called indirectly by kernfs_fop_write()), kernfs_fop_write() holds s_active while scsi_remove_device() is being called. In this case s_active is the outer locking object and scan_mutex the inner locking object. I think that it is essential that kernfs_fop_write() holds s_active. So to me this looks like a lock inversion issue that cannot be fixed by modifying kernfs only. In other words, the SCSI core has to be modified to fix this. Do you agree with this? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds.
On 10/29/2016 5:44 PM, Nicholas A. Bellinger wrote: On Sat, 2016-10-29 at 14:10 -0400, TomK wrote: On 10/29/2016 3:50 AM, Nicholas A. Bellinger wrote: Hi TomK & Co, On Fri, 2016-10-28 at 02:01 -0400, TomK wrote: On 10/26/2016 8:08 AM, TomK wrote: On 10/26/2016 3:20 AM, Nicholas A. Bellinger wrote: 1) As soon as I re-add the bad disk without the patch, I loose the LUN off the ESXi hosts. Same thing happens with the patch. No change. The disk is pulling things down. Worse, the kernel panics and locks me out of the system (http://microdevsys.com/linux-lio/messages-oct-27-2016.txt) : So after groking these logs, the point when ata6 failing scsi_device is holding outstanding I/O beyond ESX FC host side timeouts, manifests itself as ABORT_TASK tag=1122276: Oct 28 00:42:56 mbpc-pc kernel: qla2xxx [:04:00.0]-e872:20: qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 077e Oct 28 00:42:56 mbpc-pc kernel: qla2xxx [:04:00.0]-e872:20: qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 077f Oct 28 00:42:56 mbpc-pc kernel: qla2xxx [:04:00.0]-e872:20: qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0780 Oct 28 00:42:57 mbpc-pc kernel: qla2xxx [:04:00.0]-e837:20: ABTS_RECV_24XX: instance 0 Oct 28 00:42:57 mbpc-pc kernel: qla2xxx [:04:00.0]-f811:20: qla_target(0): task abort (s_id=1:5:0, tag=1122276, param=0) Oct 28 00:42:57 mbpc-pc kernel: qla2xxx [:04:00.0]-f80f:20: qla_target(0): task abort (tag=1122276) Oct 28 00:42:57 mbpc-pc kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 1122276 and eventually: Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e837:20: ABTS_RECV_24XX: instance 0 Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-f811:20: qla_target(0): task abort (s_id=1:5:0, tag=1122276, param=0) Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-f881:20: unable to find cmd in driver or LIO for tag 0x111fe4 Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-f854:20: qla_target(0): __qlt_24xx_handle_abts() failed: -2 Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task mgmt ABTS response (ha=88010f11, atio=88010f123a40, status=4 Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e838:20: ABTS_RESP_24XX: compl_status 31 Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e807:20: Sending retry TERM EXCH CTIO7 (ha=88010f11) Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task mgmt ABTS response (ha=88010f11, atio=88010f123a80, status=0 Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e838:20: ABTS_RESP_24XX: compl_status 0 The outstanding se_cmd with tag=1122276 for ata6 completes back to target-core, allowing ABORT_TASK + TMR_FUNCTION_COMPLETE status to progress: Oct 28 00:44:25 mbpc-pc kernel: hpet1: lost 9600 rtc interrupts Oct 28 00:44:29 mbpc-pc kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Oct 28 00:44:29 mbpc-pc kernel: ata6.00: configured for UDMA/133 Oct 28 00:44:29 mbpc-pc kernel: ata6.00: retrying FLUSH 0xea Emask 0x4 Oct 28 00:44:29 mbpc-pc kernel: ata6.00: device reported invalid CHS sector 0 Oct 28 00:44:29 mbpc-pc kernel: ata6: EH complete Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e81c:20: Sending TERM EXCH CTIO (ha=88010f11) Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f814:20: qla_target(0): terminating exchange for aborted cmd=880099392fa8 (se_cmd=880099392fa8, tag=1122276) Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e81c:20: Sending TERM EXCH CTIO (ha=88010f11) Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e851:20: qla_target(0): Terminating cmd 880099392fa8 with incorrect state 2 Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f81d:20: Intermediate CTIO received (status 6) Oct 28 00:44:30 mbpc-pc kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1122276 Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f81d:20: Intermediate CTIO received (status 8) Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f813:20: TM response mcmd (8800b1e2f0f0) status 0x0 state 0x0 Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task mgmt ABTS response (ha=88010f11, atio=8800b1e2f300, status=0 Oct 28 00:44:30 mbpc-pc kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1122996 Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f813:20: TM response mcmd (8800b1e2f340) status 0x2 state 0x0 Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task mgmt ABTS response (ha=88010f11, atio=8800b1e2f550, status=2 Oct 28 00:44:30 mbpc-pc kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1122204 Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f813:20: TM response mcmd (8800b1e2f590) status 0x2 state 0x0 Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task mgmt ABTS
Re: [PATCH] [SCSI] mpt3sas: Fix secure erase premature termination.
On 10/30/2016 01:43 PM, Andrey Grodzovsky wrote: Problem: This is a work around for a bug with LSI Fusion MPT SAS2 when pefroming secure erase. Due to the very long time the operation takes commands issued during the erase will time out and will trigger execution of abort hook. Even though the abort hook is called for the specifc command which timed out this leads to entire device halt (scsi_state terminated) and premature termination of the secured erase. Actually, it is _not_ the erase command which times out, it's the successive commands which time out, as the controller is unable to process them while erase is running. I suspect a bug in the SAT-layer from the mpt3sas firmware, which simply does not return 'busy' for additional commands when erase is in progress. That being said, this issue was obscured prior to implementing asynchronous aborts, as originally a timeout would be invoking SCSI EH, which would wait for all outstanding commands to complete. So by the time SCSI EH was invoked the erase command was already completed, allowing for a successful retry of the failing command. With asynchronous aborts we don't have this option, as the abort will succeed, but the command cannot be retried as the original erase command is still running. In the light of the above I guess we need something like the attached patch. I'm not utterly proud of if, but I guess it's the best we can do for the moment. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) >From 1556746987c3b4c1a1a4705625280b1136554f89 Mon Sep 17 00:00:00 2001 From: Hannes ReineckeDate: Sun, 30 Oct 2016 14:24:44 +0100 Subject: [PATCH] mpt3sas: hack: disable concurrent commands for ATA_16/ATA_12 There's a bug in the mpt3sas driver/firmware which would not return BUSY if it's busy processing requests (eg 'erase') and cannot respond to other commands. Hence these commands will timeout and eventually start the error handler. This patch disallows request processing whenever an ATA_12 or ATA_16 command is received, thereby avoiding this problem. Signed-off-by: Hannes Reinecke --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 97987e7..18b9f09 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -4096,6 +4096,13 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) sas_device_priv_data->block) return SCSI_MLQUEUE_DEVICE_BUSY; + /* + * Hack: block the device for any ATA_12/ATA_16 command + */ + if (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85) { + sas_device_priv_data = scmd->device->hostdata; + _scsih_internal_device_block(scmd->device, sas_device_priv_data); + } if (scmd->sc_data_direction == DMA_FROM_DEVICE) mpi_control = MPI2_SCSIIO_CONTROL_READ; else if (scmd->sc_data_direction == DMA_TO_DEVICE) @@ -4835,6 +4842,10 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) out: + if (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85) { + sas_device_priv_data = scmd->device->hostdata; + _scsih_internal_device_unblock(scmd->device, sas_device_priv_data); + } scsi_dma_unmap(scmd); scmd->scsi_done(scmd); -- 2.6.6
Re: [PATCH v2 02/16] scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly
On Fri, Oct 28, 2016 at 11:53:46AM +0200, Steffen Maier wrote: [...] > > > > > > > @@ -3937,6 +3944,7 @@ fc_bsg_request_handler(struct request_queue *q, > > > > struct Scsi_Host *shost, > > > > struct request *req; > > > > struct fc_bsg_job *job; > > > > enum fc_dispatch_result ret; > > > > + struct fc_bsg_reply *bsg_reply; > > > > > > > > if (!get_device(dev)) > > > > return; > > > > @@ -3973,8 +3981,9 @@ fc_bsg_request_handler(struct request_queue *q, > > > > struct Scsi_Host *shost, > > > > /* check if we have the msgcode value at least */ > > > > if (job->request_len < sizeof(uint32_t)) { > > > > BUG_ON(job->reply_len < sizeof(uint32_t)); > > > > - job->reply->reply_payload_rcv_len = 0; > > > > - job->reply->result = -ENOMSG; > > > > + bsg_reply = job->reply; > > > > + bsg_reply->reply_payload_rcv_len = 0; > > > > + bsg_reply->result = -ENOMSG; > > Compiler optimization re-ordered above two lines and the first pointer > derefence is bsg_reply->result [field offset 0] where bsg_reply is NULL. > The assignment tries to write to memory at address NULL causing the kernel > page fault. > > Does your suggested change for [PATCH v3 02/16], shuffling the > job->request_len checks, address above kernel page fault? This is what I hope at least. I'm sorry but I don't have any experience with s390 and zfcp at all. I still need to get a test environment set up, but all the people knowing how to do are rather busy at the moment. All my tests on x86_64 with FCoE and lpfc haven't had a problem so far. Thanks, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] scsi_dh_alua: fix wrong scsi_device_put() in alua_rtpg_queue()
On 10/28/2016 05:05 PM, Bart Van Assche wrote: On 10/28/2016 01:21 AM, tang.jun...@zte.com.cn wrote: From: "tang.junhui"scsi_device_put() is called when the conditions pg->rtpg_sdev!=NULL and queue_delayed_work() failure satisfied, actually it should be not to call because scsi_device_get() is not called previous in this scene. Signed-off-by: tang.junhui --- drivers/scsi/device_handler/scsi_dh_alua.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c index 241829e..78081df 100644 --- a/drivers/scsi/device_handler/scsi_dh_alua.c +++ b/drivers/scsi/device_handler/scsi_dh_alua.c @@ -866,6 +866,7 @@ static void alua_rtpg_queue(struct alua_port_group *pg, struct alua_queue_data *qdata, bool force) { int start_queue = 0; + int sdev_geted = 0; unsigned long flags; struct workqueue_struct *alua_wq = kaluad_wq; @@ -884,6 +885,7 @@ static void alua_rtpg_queue(struct alua_port_group *pg, kref_get(>kref); pg->rtpg_sdev = sdev; scsi_device_get(sdev); + sdev_geted = 1; start_queue = 1; } else if (!(pg->flags & ALUA_PG_RUN_RTPG) && force) { pg->flags |= ALUA_PG_RUN_RTPG; @@ -901,7 +903,8 @@ static void alua_rtpg_queue(struct alua_port_group *pg, if (start_queue && !queue_delayed_work(alua_wq, >rtpg_work, msecs_to_jiffies(ALUA_RTPG_DELAY_MSECS))) { - scsi_device_put(sdev); + if (sdev_geted) + scsi_device_put(sdev); kref_put(>kref, release_port_group); } } Hello Tang Junhui, Personally I prefer the patch below because that patch does not introduce a new variable. This is a patch I started testing four weeks ago and it has survived all my ALUA tests so far. Please note that I do not know whether my tests hit the !(pg->flags & ALUA_PG_RUNNING) code path. From: Bart Van Assche Date: Thu, 29 Sep 2016 10:06:51 -0700 Subject: [PATCH] scsi_dh_alua: Fix a reference counting bug Signed-off-by: Bart Van Assche Cc: --- drivers/scsi/device_handler/scsi_dh_alua.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c index 241829e..23835f7 100644 --- a/drivers/scsi/device_handler/scsi_dh_alua.c +++ b/drivers/scsi/device_handler/scsi_dh_alua.c @@ -890,6 +890,7 @@ static void alua_rtpg_queue(struct alua_port_group *pg, /* Do not queue if the worker is already running */ if (!(pg->flags & ALUA_PG_RUNNING)) { kref_get(>kref); + sdev = NULL; start_queue = 1; } } @@ -901,7 +902,8 @@ static void alua_rtpg_queue(struct alua_port_group *pg, if (start_queue && !queue_delayed_work(alua_wq, >rtpg_work, msecs_to_jiffies(ALUA_RTPG_DELAY_MSECS))) { - scsi_device_put(sdev); + if (sdev) + scsi_device_put(sdev); kref_put(>kref, release_port_group); } } Yes, this looks far better. Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [SCSI] mpt3sas: Fix secure erase premature termination.
Problem: This is a work around for a bug with LSI Fusion MPT SAS2 when pefroming secure erase. Due to the very long time the operation takes commands issued during the erase will time out and will trigger execution of abort hook. Even though the abort hook is called for the specifc command which timed out this leads to entire device halt (scsi_state terminated) and premature termination of the secured erase. Fix: Set device state to busy while erase in progress to reject any incoming commands until the erase is done. The device is blocked any way during this time and cannot execute any other command. More data and logs can be found here - https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view Signed-off-by: Andrey GrodzovskyCc: Cc: Sathya Prakash Cc: Chaitra P B Cc: Suganath Prabu Subramani Cc: Sreekanth Reddy Cc: Hannes Reinecke Cc: --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 30 ++ 1 file changed, 30 insertions(+) diff --git [PATCH]drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 5a97e32..5542dd02 100644 --- [PATCH]drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3500,6 +3500,23 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 ioc_status) SAM_STAT_CHECK_CONDITION; } +/** + * This is a work around for a bug with LSI Fusion MPT SAS2 when + * pefroming secure erase. Due to the verly long time the operation + * takes commands issued during the erase will time out and will trigger + * execution of abort hook. This leads to device reset and premature + * termination of the secured erase. + */ +static inline bool disk_erase_command(struct scsi_cmnd *scmd) +{ + /** + * Identify secure erase command according to + * ATA/ATAPI Command Set (ATA8-ACS) p.202 + */ + return scmd->cmd_len == 16 && scmd->cmnd[14] == 0xf4; +} + + /** * _scsih_qcmd - main scsi request entry point @@ -3528,6 +3545,14 @@ _scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd) scsi_print_command(scmd); #endif + /** + * Lock the device for any subsequent command until secured erase + * command is done. + */ + if (disk_erase_command(scmd)) + scsi_internal_device_block(scmd->device); + + sas_device_priv_data = scmd->device->hostdata; if (!sas_device_priv_data || !sas_device_priv_data->sas_target) { scmd->result = DID_NO_CONNECT << 16; @@ -4062,6 +4087,11 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) if (scmd == NULL) return 1; + /* Secured erase is done. Unlock the device */ + if (disk_erase_command(scmd)) + scsi_internal_device_unblock(scmd->device, SDEV_RUNNING); + + mpi_request = mpt3sas_base_get_msg_frame(ioc, smid); if (mpi_reply == NULL) { -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] NCR5380: Use probe_irq_*() for IRQ probing
Use standard probe_irq_on() and probe_irq_off() functions instead of own implementation. This prevents warning messages like this in the kernel log: genirq: Flags mismatch irq 1. (NCR-probe) vs. 0080 (i8042) Move the IRQ trigger code to a separate function so it can be used for other purposes (testing if the IRQ works). Also add missing IRQ reset after probe. Signed-off-by: Ondrej Zary--- drivers/scsi/NCR5380.c | 72 +- drivers/scsi/NCR5380.h |3 +- drivers/scsi/g_NCR5380.c |2 +- 3 files changed, 29 insertions(+), 48 deletions(-) diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c index d849ffa..01c0027 100644 --- a/drivers/scsi/NCR5380.c +++ b/drivers/scsi/NCR5380.c @@ -351,50 +351,13 @@ static void NCR5380_print_phase(struct Scsi_Host *instance) } #endif - -static int probe_irq; - -/** - * probe_intr - helper for IRQ autoprobe - * @irq: interrupt number - * @dev_id: unused - * @regs: unused - * - * Set a flag to indicate the IRQ in question was received. This is - * used by the IRQ probe code. - */ - -static irqreturn_t probe_intr(int irq, void *dev_id) -{ - probe_irq = irq; - return IRQ_HANDLED; -} - -/** - * NCR5380_probe_irq - find the IRQ of an NCR5380 - * @instance: NCR5380 controller - * @possible: bitmask of ISA IRQ lines - * - * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ - * and then looking to see what interrupt actually turned up. - */ - -static int __maybe_unused NCR5380_probe_irq(struct Scsi_Host *instance, - int possible) +static void NCR5380_trigger_irq(struct Scsi_Host *instance) { struct NCR5380_hostdata *hostdata = shost_priv(instance); - unsigned long timeout; - int trying_irqs, i, mask; - - for (trying_irqs = 0, i = 1, mask = 2; i < 16; ++i, mask <<= 1) - if ((mask & possible) && (request_irq(i, _intr, 0, "NCR-probe", NULL) == 0)) - trying_irqs |= mask; - - timeout = jiffies + msecs_to_jiffies(250); - probe_irq = NO_IRQ; + unsigned long timeout = jiffies + msecs_to_jiffies(2500); /* -* A interrupt is triggered whenever BSY = false, SEL = true +* An interrupt is triggered whenever BSY = false, SEL = true * and a bit set in the SELECT_ENABLE_REG is asserted on the * SCSI bus. * @@ -402,23 +365,40 @@ static int __maybe_unused NCR5380_probe_irq(struct Scsi_Host *instance, * (I/O, C/D, and MSG) match those in the TCR, so we must reset that * to zero. */ - NCR5380_write(TARGET_COMMAND_REG, 0); NCR5380_write(SELECT_ENABLE_REG, hostdata->id_mask); NCR5380_write(OUTPUT_DATA_REG, hostdata->id_mask); NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_DATA | ICR_ASSERT_SEL); - while (probe_irq == NO_IRQ && time_before(jiffies, timeout)) + while (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_IRQ) && + time_before(jiffies, timeout)) schedule_timeout_uninterruptible(1); NCR5380_write(SELECT_ENABLE_REG, 0); NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE); +} - for (i = 1, mask = 2; i < 16; ++i, mask <<= 1) - if (trying_irqs & mask) - free_irq(i, NULL); +/** + * NCR5380_probe_irq - find the IRQ of an NCR5380 + * @instance: NCR5380 controller + * + * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ + * and then looking to see what interrupt actually turned up. + */ + +static int NCR5380_probe_irq(struct Scsi_Host *instance) +{ + struct NCR5380_hostdata *hostdata = shost_priv(instance); + int irqs, irq; + + irqs = probe_irq_on(); + NCR5380_trigger_irq(instance); + irq = probe_irq_off(irqs); + NCR5380_read(RESET_PARITY_INTERRUPT_REG); + if (irq < 0) + irq = 0; - return probe_irq; + return irq; } /** diff --git a/drivers/scsi/NCR5380.h b/drivers/scsi/NCR5380.h index 3c6ce54..b2fca52 100644 --- a/drivers/scsi/NCR5380.h +++ b/drivers/scsi/NCR5380.h @@ -290,7 +290,8 @@ static inline struct scsi_cmnd *NCR5380_to_scmd(struct NCR5380_cmd *ncmd_ptr) #define NCR5380_dprint_phase(flg, arg) do {} while (0) #endif -static int NCR5380_probe_irq(struct Scsi_Host *instance, int possible); +static void NCR5380_trigger_irq(struct Scsi_Host *instance); +static int NCR5380_probe_irq(struct Scsi_Host *instance); static int NCR5380_init(struct Scsi_Host *instance, int flags); static int NCR5380_maybe_reset_bus(struct Scsi_Host *); static void NCR5380_exit(struct Scsi_Host *instance); diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c index 7299ad9..3790ed5 100644 --- a/drivers/scsi/g_NCR5380.c +++ b/drivers/scsi/g_NCR5380.c @@ -265,7 +265,7 @@ static int generic_NCR5380_init_one(struct
[PATCH 2/2] g_NCR5380: Test the IRQ before accepting it
Trigger an IRQ first with a test IRQ handler to find out if it really works. Disable the IRQ if not. This prevents hang when incorrect IRQ was specified by user. Signed-off-by: Ondrej Zary--- drivers/scsi/g_NCR5380.c | 31 +-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c index 3790ed5..f0c5b10 100644 --- a/drivers/scsi/g_NCR5380.c +++ b/drivers/scsi/g_NCR5380.c @@ -67,6 +67,14 @@ MODULE_ALIAS("g_NCR5380_mmio"); MODULE_LICENSE("GPL"); +static bool irq_working; + +static irqreturn_t test_irq(int irq, void *dev_id) +{ + irq_working = true; + return IRQ_HANDLED; +} + /* * Configure I/O address of 53C400A or DTC436 by writing magic numbers * to ports 0x779 and 0x379. @@ -275,10 +283,29 @@ static int generic_NCR5380_init_one(struct scsi_host_template *tpnt, /* set IRQ for HP C2502 */ if (board == BOARD_HP_C2502) magic_configure(port_idx, instance->irq, magic); - if (request_irq(instance->irq, generic_NCR5380_intr, - 0, "NCR5380", instance)) { + /* test if the IRQ is working */ + irq_working = false; + if (request_irq(instance->irq, test_irq, + 0, "NCR5380-irqtest", NULL)) { printk(KERN_WARNING "scsi%d : IRQ%d not free, interrupts disabled\n", instance->host_no, instance->irq); instance->irq = NO_IRQ; + } else { + NCR5380_trigger_irq(instance); + free_irq(instance->irq, NULL); + if (irq_working) { + if (request_irq(instance->irq, + generic_NCR5380_intr, 0, + "NCR5380", instance)) { + printk(KERN_WARNING "scsi%d : IRQ%d not free, interrupts disabled\n", + instance->host_no, + instance->irq); + instance->irq = NO_IRQ; + } + } else { + printk(KERN_WARNING "scsi%d : IRQ%d not working, interrupts disabled\n", + instance->host_no, instance->irq); + instance->irq = NO_IRQ; + } } } -- Ondrej Zary -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html