Re: [PATCH 3/3] NCR5380: Check for chip presence in NCR5380_init()

2016-10-30 Thread Finn Thain

On Sun, 30 Oct 2016, Ondrej Zary wrote:

> Read back MODE_REG after writing it in NCR5380_init() to check if the
> chip is really there.
> 
> This prevents hang when incorrect I/O address was specified by user.

Do you know whereabouts in the driver the hang happens? Maybe there is a 
robustness issue there.

Card type detection (and vacant slot detection) is a good idea but I'm not 
sure how we can detect this chip reliably.

-- 

> 
> Signed-off-by: Ondrej Zary 
> ---
>  drivers/scsi/NCR5380.c |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c
> index 01c0027..ce3156d 100644
> --- a/drivers/scsi/NCR5380.c
> +++ b/drivers/scsi/NCR5380.c
> @@ -495,6 +495,11 @@ static int NCR5380_init(struct Scsi_Host *instance, int 
> flags)
>  
>   NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
>   NCR5380_write(MODE_REG, MR_BASE);
> + /* check if the chip is really there */
> + if (NCR5380_read(MODE_REG) != MR_BASE) {
> + NCR5380_exit(instance);
> + return -ENODEV;
> + }
>   NCR5380_write(TARGET_COMMAND_REG, 0);
>   NCR5380_write(SELECT_ENABLE_REG, 0);
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] NCR5380: Check for chip presence in NCR5380_init()

2016-10-30 Thread Finn Thain

On Sun, 30 Oct 2016, Ondrej Zary wrote:

> Read back MODE_REG after writing it in NCR5380_init() to check if the
> chip is really there.
> 
> This prevents hang when incorrect I/O address was specified by user.
> 
> Signed-off-by: Ondrej Zary 
> ---
>  drivers/scsi/NCR5380.c |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c
> index 01c0027..ce3156d 100644
> --- a/drivers/scsi/NCR5380.c
> +++ b/drivers/scsi/NCR5380.c
> @@ -495,6 +495,11 @@ static int NCR5380_init(struct Scsi_Host *instance, int 
> flags)
>  
>   NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
>   NCR5380_write(MODE_REG, MR_BASE);
> + /* check if the chip is really there */
> + if (NCR5380_read(MODE_REG) != MR_BASE) {
> + NCR5380_exit(instance);
> + return -ENODEV;
> + }

This doesn't belong in the core driver. Only the 5380 ISA drivers have 
configurable base addresses.

Also, MR_BASE == 0, so that test is likely to be ineffectual anyway. This 
patch doesn't really add any value AFAICT.

-- 

>   NCR5380_write(TARGET_COMMAND_REG, 0);
>   NCR5380_write(SELECT_ENABLE_REG, 0);
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] g_NCR5380: Test the IRQ before accepting it

2016-10-30 Thread Finn Thain

On Sun, 30 Oct 2016, Ondrej Zary wrote:

> Trigger an IRQ first with a test IRQ handler to find out if it really
> works. Disable the IRQ if not.
> 
> This prevents hang when incorrect IRQ was specified by user.
> 
> Signed-off-by: Ondrej Zary 
> ---
>  drivers/scsi/g_NCR5380.c |   32 ++--
>  1 file changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
> index 3790ed5..261e168 100644
> --- a/drivers/scsi/g_NCR5380.c
> +++ b/drivers/scsi/g_NCR5380.c
> @@ -67,6 +67,14 @@
>  MODULE_ALIAS("g_NCR5380_mmio");
>  MODULE_LICENSE("GPL");
>  
> +static bool irq_working;
> +
> +static irqreturn_t test_irq(int irq, void *dev_id)
> +{
> + irq_working = true;
> + return IRQ_HANDLED;
> +}
> +
>  /*
>   * Configure I/O address of 53C400A or DTC436 by writing magic numbers
>   * to ports 0x779 and 0x379.
> @@ -275,10 +283,30 @@ static int generic_NCR5380_init_one(struct 
> scsi_host_template *tpnt,
>   /* set IRQ for HP C2502 */
>   if (board == BOARD_HP_C2502)
>   magic_configure(port_idx, instance->irq, magic);
> - if (request_irq(instance->irq, generic_NCR5380_intr,
> - 0, "NCR5380", instance)) {
> + /* test if the IRQ is working */
> + irq_working = false;
> + if (request_irq(instance->irq, test_irq,
> + 0, "NCR5380-irqtest", NULL)) {
>   printk(KERN_WARNING "scsi%d : IRQ%d not free, 
> interrupts disabled\n", instance->host_no, instance->irq);
>   instance->irq = NO_IRQ;
> + } else {
> + NCR5380_trigger_irq(instance);
> + NCR5380_read(RESET_PARITY_INTERRUPT_REG);
> + free_irq(instance->irq, NULL);
> + if (irq_working) {
> + if (request_irq(instance->irq,
> + generic_NCR5380_intr, 0,
> + "NCR5380", instance)) {
> + printk(KERN_WARNING "scsi%d : IRQ%d not 
> free, interrupts disabled\n",
> +instance->host_no,
> +instance->irq);
> + instance->irq = NO_IRQ;
> + }
> + } else {
> + printk(KERN_WARNING "scsi%d : IRQ%d not 
> working, interrupts disabled\n",
> +instance->host_no, instance->irq);
> + instance->irq = NO_IRQ;
> + }
>   }
>   }
>  
> 

If the user omits to specify an irq, you can just default to IRQ_AUTO. 
This might result in NO_IRQ, which gives the same result as this patch.
 
And when the user does specify an IRQ, we should trust them. So this 
compexity doesn't add any value AFAICT. Thanks but no thanks.

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] NCR5380: Use probe_irq_*() for IRQ probing

2016-10-30 Thread Finn Thain

On Sun, 30 Oct 2016, Ondrej Zary wrote:

> Use standard probe_irq_on() and probe_irq_off() functions instead of own
> implementation.

Thanks for doing this.

> This prevents warning messages like this in the kernel log:
> genirq: Flags mismatch irq 1.  (NCR-probe) vs. 0080 (i8042)
> 
> Move the IRQ trigger code to a separate function so it can be used for
> other purposes (testing if the IRQ works).
> 
> Also add missing IRQ reset after probe.
> 
> Signed-off-by: Ondrej Zary 
> ---
>  drivers/scsi/NCR5380.c   |   72 
> +-
>  drivers/scsi/NCR5380.h   |3 +-
>  drivers/scsi/g_NCR5380.c |2 +-
>  3 files changed, 29 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c
> index d849ffa..01c0027 100644
> --- a/drivers/scsi/NCR5380.c
> +++ b/drivers/scsi/NCR5380.c
> @@ -351,50 +351,13 @@ static void NCR5380_print_phase(struct Scsi_Host 
> *instance)
>  }
>  #endif
>  
> -
> -static int probe_irq;
> -
> -/**
> - * probe_intr-   helper for IRQ autoprobe
> - * @irq: interrupt number
> - * @dev_id: unused
> - * @regs: unused
> - *
> - * Set a flag to indicate the IRQ in question was received. This is
> - * used by the IRQ probe code.
> - */
> -
> -static irqreturn_t probe_intr(int irq, void *dev_id)
> -{
> - probe_irq = irq;
> - return IRQ_HANDLED;
> -}
> -
> -/**
> - * NCR5380_probe_irq -   find the IRQ of an NCR5380
> - * @instance: NCR5380 controller
> - * @possible: bitmask of ISA IRQ lines
> - *
> - * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ
> - * and then looking to see what interrupt actually turned up.
> - */
> -
> -static int __maybe_unused NCR5380_probe_irq(struct Scsi_Host *instance,
> - int possible)
> +static void NCR5380_trigger_irq(struct Scsi_Host *instance)
>  {
>   struct NCR5380_hostdata *hostdata = shost_priv(instance);
> - unsigned long timeout;
> - int trying_irqs, i, mask;
> -
> - for (trying_irqs = 0, i = 1, mask = 2; i < 16; ++i, mask <<= 1)
> - if ((mask & possible) && (request_irq(i, _intr, 0, 
> "NCR-probe", NULL) == 0))
> - trying_irqs |= mask;
> -
> - timeout = jiffies + msecs_to_jiffies(250);
> - probe_irq = NO_IRQ;
> + unsigned long timeout = jiffies + msecs_to_jiffies(2500);
>  
>   /*
> -  * A interrupt is triggered whenever BSY = false, SEL = true
> +  * An interrupt is triggered whenever BSY = false, SEL = true
>* and a bit set in the SELECT_ENABLE_REG is asserted on the
>* SCSI bus.
>*
> @@ -402,23 +365,40 @@ static int __maybe_unused NCR5380_probe_irq(struct 
> Scsi_Host *instance,
>* (I/O, C/D, and MSG) match those in the TCR, so we must reset that
>* to zero.
>*/
> -
>   NCR5380_write(TARGET_COMMAND_REG, 0);
>   NCR5380_write(SELECT_ENABLE_REG, hostdata->id_mask);
>   NCR5380_write(OUTPUT_DATA_REG, hostdata->id_mask);
>   NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_DATA | 
> ICR_ASSERT_SEL);
>  
> - while (probe_irq == NO_IRQ && time_before(jiffies, timeout))
> + while (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_IRQ) &&
> +time_before(jiffies, timeout))
>   schedule_timeout_uninterruptible(1);

You don't need a 2.5 second timeout here. To raise the interrupt the chip 
should not need more than a bus settle delay (400ns). Calling msleep(1) 
should be plenty.

Better still, please drop the loop, the delay and the BASR_IRQ test here, 
and if need be, put the delay in the caller. This routine discards the 
result of that sequence anyway.

>  
>   NCR5380_write(SELECT_ENABLE_REG, 0);
>   NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
> +}
>  
> - for (i = 1, mask = 2; i < 16; ++i, mask <<= 1)
> - if (trying_irqs & mask)
> - free_irq(i, NULL);
> +/**
> + * NCR5380_probe_irq -   find the IRQ of an NCR5380
> + * @instance: NCR5380 controller
> + *
> + * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ
> + * and then looking to see what interrupt actually turned up.
> + */
> +
> +static int NCR5380_probe_irq(struct Scsi_Host *instance)
> +{
> + struct NCR5380_hostdata *hostdata = shost_priv(instance);
> + int irqs, irq;
> +

Please rename irqs to irq_mask.

Other drivers like tty/cyclades.c begin with,

/* forget possible initially masked and pending IRQ */
irq = probe_irq_off(probe_irq_on());

Perhaps we need that here too?

> + irqs = probe_irq_on();

You should clear any pending interrupt on the chip before you try to 
trigger another one.

> + NCR5380_trigger_irq(instance);

Please do the msleep(1) here.

> + irq = probe_irq_off(irqs);
> + NCR5380_read(RESET_PARITY_INTERRUPT_REG);
> + if (irq < 0)
> + irq = 0;

Please use NO_IRQ instead of 0.

>  
> - 

RE: iscsi target kernel error log support

2016-10-30 Thread Jin, Ke (Nokia - CN/Hangzhou)
Hi,
Thanks for reply:)

Actually when configure second target, I execute targetcli /iscsi create 
iqn.2016-08.nokia.lab:omu-0 which will automatically create the 0.0.0.0 3260 
listening socket for me.
But I think we do not need it at all, because we will delete it later, after 
that configure own IP(like below 10.180.180.222 ), so I am wondering whether is 
there any other iscsi target configuration can avoid this unnecessary issue.

"
targetcli /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/portals delete 0.0.0.0 3260
targetcli /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/portals create 10.180.180.222
Using default IP port 3260
"

Thanks again.

Brs,
Jin Ke



-Original Message-
From: Nicholas A. Bellinger [mailto:n...@linux-iscsi.org] 
Sent: Sunday, October 30, 2016 7:58 AM
To: Jin, Ke (Nokia - CN/Hangzhou) 
Cc: target-de...@vger.kernel.org; linux-scsi@vger.kernel.org; Xiong, Li (Nokia 
- CN/Hangzhou) ; Tom Herbert ; Andy 
Grover 
Subject: Re: iscsi target kernel error log support

Hi Jin,

Adding Tom CC' for SO_REUSEPORT and Andy CC' for INADDR_ANY + targetcli.

On Tue, 2016-10-18 at 01:25 +, Jin, Ke (Nokia - CN/Hangzhou) wrote:
> Hi,
> Sorry to disturb you.
>  
> I am using targetcli to configure the iscsi target on my linux host. I
> find one issue, could you kindly review it and give some comments to
> me.
>  
> Issue: (linux kernel 4.4 stable)
>  
> Configure first target:
> 10/14/16 17:07:00.538417942: targetcli /iscsi create 
> iqn.2016-08.nokia.lab:chu-0
> Created target iqn.2016-08.nokia.lab:chu-0.
> Created TPG 1.
> Global pref auto_add_default_portal=true
> Created default portal listening on all IPs (0.0.0.0), port 3260.
> 10/14/16 17:07:00.650688229: targetcli /backstores/block create iblock_0 
> dev=/dev/sda
> Created block storage object iblock_0 using /dev/sda.
> 10/14/16 17:07:00.772435404: targetcli 
> /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/luns create 
> /backstores/block/iblock_0 1
> Created LUN 1.
> 10/14/16 17:07:00.894355883: targetcli 
> /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/portals delete 0.0.0.0 3260
> Deleted network portal 0.0.0.0:3260
> 10/14/16 17:07:01.012410786: targetcli 
> /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/portals create 10.180.180.111
> Using default IP port 3260
> Created network portal 10.180.180.111:3260.
> 10/14/16 17:07:01.130478681: targetcli 
> /iscsi/iqn.2016-08.nokia.lab:chu-0/tpg1/acls create 
> iqn.2016-08.nokia.lab:initiator.chu-0-0
> Created Node ACL for iqn.2016-08.nokia.lab:initiator.chu-0-0
> Created mapped LUN 1.
>  
> Then I configure second target on same host.
> targetcli /iscsi create iqn.2016-08.nokia.lab:omu-0
> Created target iqn.2016-08.nokia.lab:omu.
> Created TPG 1.
> Default portal not created, TPGs within a target cannot share ip:port.
> Then I check the kernel log with dmesg, there is one error log
> “kernel_bind() failed: -98”.
>  
>  

So AFAIK, the failure of INADDR_ANY portal binding when !INADDR_ANY
portals exist was the original expected behavior.

>  
> Then I begin to check the iscsi kernel module code. linux-stable
> \drivers\target\iscsi\ iscsi_target_login.c
> int iscsit_setup_np()
> {
> ret = kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, 
> (char *), sizeof(opt));
> }
> I think SO_REUSEADDR is not enough in this case (there have two
> targets configured), when targetcli /iscsi create
> iqn.2016-08.nokia.lab:omu-0, it will create listening socket with
> 0,0,0,0:3260, because it already have 10.180.180.111:3260 listening
> sockets there in first target, so this cause the kernel_bind() failed:
> -98”. So I add SO_REUSEPORT also. It works.
> int iscsit_setup_np()
> {
> ret = kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, 
> (char *), sizeof(opt));
> ret = kernel_setsockopt(sock, SOL_SOCKET, SO_REUSEPORT,
> (char *), sizeof(opt));
> }
>  
> How do you think, do you think this is the kernel bug or just my
> configuration issue, or you already have this patch?
>  

At iscsi-target application level, we're using a iscsi_np kthread
listener per ipv4 / ipv6 addr + port. (eg: one listener per iscsi
network portal)

The network portal can be shared across N target endpoints
in /sys/kernel/config/target/iscsi/$IQN/$TPGT/np/*, but the actual
listening + accept for each ipv4 / ipv6 addr + port is still done from a
single kthread context.

So AFAICT for your use-case, adding SO_REUSEPORT to allow concurrent
INADDR_ANY + !INADDR_ANY network portals should be OK, at least from an
iscsi-target perspective.

That is, iscsi-target would consider each as a separate network portal,
allowing INADDR_ANY + !INADDR_ANY accepts to be serviced across multiple
iscsi_np kthreads.

There should not be an inherent issue allowing this, but it would useful
to understand if there are any other possible ramifications before
enabling by default.


[PATCH 3/3] NCR5380: Check for chip presence in NCR5380_init()

2016-10-30 Thread Ondrej Zary
Read back MODE_REG after writing it in NCR5380_init() to check if the
chip is really there.

This prevents hang when incorrect I/O address was specified by user.

Signed-off-by: Ondrej Zary 
---
 drivers/scsi/NCR5380.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c
index 01c0027..ce3156d 100644
--- a/drivers/scsi/NCR5380.c
+++ b/drivers/scsi/NCR5380.c
@@ -495,6 +495,11 @@ static int NCR5380_init(struct Scsi_Host *instance, int 
flags)
 
NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
NCR5380_write(MODE_REG, MR_BASE);
+   /* check if the chip is really there */
+   if (NCR5380_read(MODE_REG) != MR_BASE) {
+   NCR5380_exit(instance);
+   return -ENODEV;
+   }
NCR5380_write(TARGET_COMMAND_REG, 0);
NCR5380_write(SELECT_ENABLE_REG, 0);
 
-- 
Ondrej Zary

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] g_NCR5380: Test the IRQ before accepting it

2016-10-30 Thread Ondrej Zary
Trigger an IRQ first with a test IRQ handler to find out if it really
works. Disable the IRQ if not.

This prevents hang when incorrect IRQ was specified by user.

Signed-off-by: Ondrej Zary 
---
 drivers/scsi/g_NCR5380.c |   32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 3790ed5..261e168 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -67,6 +67,14 @@
 MODULE_ALIAS("g_NCR5380_mmio");
 MODULE_LICENSE("GPL");
 
+static bool irq_working;
+
+static irqreturn_t test_irq(int irq, void *dev_id)
+{
+   irq_working = true;
+   return IRQ_HANDLED;
+}
+
 /*
  * Configure I/O address of 53C400A or DTC436 by writing magic numbers
  * to ports 0x779 and 0x379.
@@ -275,10 +283,30 @@ static int generic_NCR5380_init_one(struct 
scsi_host_template *tpnt,
/* set IRQ for HP C2502 */
if (board == BOARD_HP_C2502)
magic_configure(port_idx, instance->irq, magic);
-   if (request_irq(instance->irq, generic_NCR5380_intr,
-   0, "NCR5380", instance)) {
+   /* test if the IRQ is working */
+   irq_working = false;
+   if (request_irq(instance->irq, test_irq,
+   0, "NCR5380-irqtest", NULL)) {
printk(KERN_WARNING "scsi%d : IRQ%d not free, 
interrupts disabled\n", instance->host_no, instance->irq);
instance->irq = NO_IRQ;
+   } else {
+   NCR5380_trigger_irq(instance);
+   NCR5380_read(RESET_PARITY_INTERRUPT_REG);
+   free_irq(instance->irq, NULL);
+   if (irq_working) {
+   if (request_irq(instance->irq,
+   generic_NCR5380_intr, 0,
+   "NCR5380", instance)) {
+   printk(KERN_WARNING "scsi%d : IRQ%d not 
free, interrupts disabled\n",
+  instance->host_no,
+  instance->irq);
+   instance->irq = NO_IRQ;
+   }
+   } else {
+   printk(KERN_WARNING "scsi%d : IRQ%d not 
working, interrupts disabled\n",
+  instance->host_no, instance->irq);
+   instance->irq = NO_IRQ;
+   }
}
}
 
-- 
Ondrej Zary

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] NCR5380: Use probe_irq_*() for IRQ probing

2016-10-30 Thread Ondrej Zary
Use standard probe_irq_on() and probe_irq_off() functions instead of own
implementation.
This prevents warning messages like this in the kernel log:
genirq: Flags mismatch irq 1.  (NCR-probe) vs. 0080 (i8042)

Move the IRQ trigger code to a separate function so it can be used for
other purposes (testing if the IRQ works).

Also add missing IRQ reset after probe.

Signed-off-by: Ondrej Zary 
---
 drivers/scsi/NCR5380.c   |   72 +-
 drivers/scsi/NCR5380.h   |3 +-
 drivers/scsi/g_NCR5380.c |2 +-
 3 files changed, 29 insertions(+), 48 deletions(-)

diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c
index d849ffa..01c0027 100644
--- a/drivers/scsi/NCR5380.c
+++ b/drivers/scsi/NCR5380.c
@@ -351,50 +351,13 @@ static void NCR5380_print_phase(struct Scsi_Host 
*instance)
 }
 #endif
 
-
-static int probe_irq;
-
-/**
- * probe_intr  -   helper for IRQ autoprobe
- * @irq: interrupt number
- * @dev_id: unused
- * @regs: unused
- *
- * Set a flag to indicate the IRQ in question was received. This is
- * used by the IRQ probe code.
- */
-
-static irqreturn_t probe_intr(int irq, void *dev_id)
-{
-   probe_irq = irq;
-   return IRQ_HANDLED;
-}
-
-/**
- * NCR5380_probe_irq   -   find the IRQ of an NCR5380
- * @instance: NCR5380 controller
- * @possible: bitmask of ISA IRQ lines
- *
- * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ
- * and then looking to see what interrupt actually turned up.
- */
-
-static int __maybe_unused NCR5380_probe_irq(struct Scsi_Host *instance,
-   int possible)
+static void NCR5380_trigger_irq(struct Scsi_Host *instance)
 {
struct NCR5380_hostdata *hostdata = shost_priv(instance);
-   unsigned long timeout;
-   int trying_irqs, i, mask;
-
-   for (trying_irqs = 0, i = 1, mask = 2; i < 16; ++i, mask <<= 1)
-   if ((mask & possible) && (request_irq(i, _intr, 0, 
"NCR-probe", NULL) == 0))
-   trying_irqs |= mask;
-
-   timeout = jiffies + msecs_to_jiffies(250);
-   probe_irq = NO_IRQ;
+   unsigned long timeout = jiffies + msecs_to_jiffies(2500);
 
/*
-* A interrupt is triggered whenever BSY = false, SEL = true
+* An interrupt is triggered whenever BSY = false, SEL = true
 * and a bit set in the SELECT_ENABLE_REG is asserted on the
 * SCSI bus.
 *
@@ -402,23 +365,40 @@ static int __maybe_unused NCR5380_probe_irq(struct 
Scsi_Host *instance,
 * (I/O, C/D, and MSG) match those in the TCR, so we must reset that
 * to zero.
 */
-
NCR5380_write(TARGET_COMMAND_REG, 0);
NCR5380_write(SELECT_ENABLE_REG, hostdata->id_mask);
NCR5380_write(OUTPUT_DATA_REG, hostdata->id_mask);
NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_DATA | 
ICR_ASSERT_SEL);
 
-   while (probe_irq == NO_IRQ && time_before(jiffies, timeout))
+   while (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_IRQ) &&
+  time_before(jiffies, timeout))
schedule_timeout_uninterruptible(1);
 
NCR5380_write(SELECT_ENABLE_REG, 0);
NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
+}
 
-   for (i = 1, mask = 2; i < 16; ++i, mask <<= 1)
-   if (trying_irqs & mask)
-   free_irq(i, NULL);
+/**
+ * NCR5380_probe_irq   -   find the IRQ of an NCR5380
+ * @instance: NCR5380 controller
+ *
+ * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ
+ * and then looking to see what interrupt actually turned up.
+ */
+
+static int NCR5380_probe_irq(struct Scsi_Host *instance)
+{
+   struct NCR5380_hostdata *hostdata = shost_priv(instance);
+   int irqs, irq;
+
+   irqs = probe_irq_on();
+   NCR5380_trigger_irq(instance);
+   irq = probe_irq_off(irqs);
+   NCR5380_read(RESET_PARITY_INTERRUPT_REG);
+   if (irq < 0)
+   irq = 0;
 
-   return probe_irq;
+   return irq;
 }
 
 /**
diff --git a/drivers/scsi/NCR5380.h b/drivers/scsi/NCR5380.h
index 3c6ce54..b2fca52 100644
--- a/drivers/scsi/NCR5380.h
+++ b/drivers/scsi/NCR5380.h
@@ -290,7 +290,8 @@ static inline struct scsi_cmnd *NCR5380_to_scmd(struct 
NCR5380_cmd *ncmd_ptr)
 #define NCR5380_dprint_phase(flg, arg) do {} while (0)
 #endif
 
-static int NCR5380_probe_irq(struct Scsi_Host *instance, int possible);
+static void NCR5380_trigger_irq(struct Scsi_Host *instance);
+static int NCR5380_probe_irq(struct Scsi_Host *instance);
 static int NCR5380_init(struct Scsi_Host *instance, int flags);
 static int NCR5380_maybe_reset_bus(struct Scsi_Host *);
 static void NCR5380_exit(struct Scsi_Host *instance);
diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 7299ad9..3790ed5 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -265,7 +265,7 @@ static int generic_NCR5380_init_one(struct 

Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock

2016-10-30 Thread James Bottomley
On Sun, 2016-10-30 at 19:22 +, Bart Van Assche wrote:
> On 10/28/16 19:08, James Bottomley wrote:
> > This is a deadlock caused by an inversion issue in kernfs (suicide
> > vs
> > non-suicide removes); so fixing it in SCSI alone really isn't
> > appropriate.  I count at least five other subsystems all using this
> > mechanism, so they'll all be similarly affected.  It looks to be
> > fairly
> > simply fixable inside kernfs, so please fix it that way.
> 
> Hello James,
> 
> Can you clarify this further? To me this looks like the result of how
> the SCSI core works rather than an issue in the kernfs layer.

I'm at a bit of a loss, the problem looks clear from the original
trace, so I'm not really sure what's not clear to you.

The inversion is between the scan mutex and s_active which is the
rather fanciful name Tejun gave to the hand rolled mutex in
kernfs_node.

The reason for the inversion is that s_active is taken when you open a
sysfs file, including the delete one.  There's a special suidice path
to allow that file to be deleted while something else holds the lock. 
 However, if the delete path also takes any lock, and there's a way to
get into delete not via writing to sysfs (which is pretty much
universally true) then you get an inversion because kernfs_node mutex
is also taken when the file is removed, which is why it's not specific
to scsi.

Since you press the issue, I've got to say I'm not a huge fan of trying
to escape from a lock inversion by making some path asynchronous
because it usually leads to even more problems on down the road.  If
there's some problem with the generic fix, there is a way of fixing
this in SCSI without introducing asynchronicity.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Need some pointers to debug a target hang

2016-10-30 Thread Johannes Thumshirn
On Sat, Oct 29, 2016 at 03:53:25PM -0700, Nicholas A. Bellinger wrote:
> Hi Johannes & Zhu,
> 
> On Tue, 2016-10-18 at 23:29 -0700, Nicholas A. Bellinger wrote:
> > On Tue, 2016-10-18 at 19:35 +0200, Johannes Thumshirn wrote:
> > > On Tue, Oct 18, 2016 at 09:01:34AM +0200, Johannes Thumshirn wrote:
> > > 
> > > [...]
> > > 
> > > > > 
> > > > > This is likely the missing SCF_ACK_KREF assignment in >= v4.1.y:
> > > > > 
> > > > > http://www.spinics.net/lists/target-devel/msg13530.html
> > > 
> > > Sorry to disappoint you but it didn't fix my issue. Is there any debug 
> > > data I
> > > can give you, or do you have an advice where I could start looking?
> > > 
> > > I have to admit I only tested the patch on our downstream kernel and not 
> > > the
> > > upstream kernel, I'll repeat the tests on 4.8 final tomorrow.
> > > 
> > 
> > Is it possible to generate vmcore to have a look in crash + gdb..?
> > 
> > 
> 
> Just curious if you've been able to verify the above patch for
> v4.8.y iscsi-target ports on your specific config(s)..?
> 
> To confirm, using v4.1.y + patch I've not run into any further
> se_cmd->cmd_kref leaks and/or hung tasks with iscsi-target ports due to
> high backend device I/O latency, resulting in host timeouts + ABORT_TASK
> + session reinstatement to occur while waiting for outstanding se_cmd
> backend I/O to complete.

Unfortunately v4.8.5 which has above patch included (I double checked to be
sure) shows the same effects. 

Here's my "reproducer":

 VM1 (target)   VM2 (Initiator)
    ===
dd if=/dev/sda of=/dev/null bs=1 (given sda 
is the iscsi LUN)

Wait a little bit (~10 secs)

 dmsetup suspend dm-0
 watch -n1 'ps aux |\
  grep -E "\[iscsi" | grep -v grep'

I'll build a kernel with all target stuff builtin and full debug symbols.
Let's see if I can find out something.

Thanks,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Avoid that SCSI device removal through sysfs triggers a deadlock

2016-10-30 Thread Bart Van Assche
On 10/28/16 19:08, James Bottomley wrote:
> This is a deadlock caused by an inversion issue in kernfs (suicide vs
> non-suicide removes); so fixing it in SCSI alone really isn't
> appropriate.  I count at least five other subsystems all using this
> mechanism, so they'll all be similarly affected.  It looks to be fairly
> simply fixable inside kernfs, so please fix it that way.

Hello James,

Can you clarify this further? To me this looks like the result of how 
the SCSI core works rather than an issue in the kernfs layer. My 
interpretation of the deadlock report produced by the lockdep code is as 
follows:
* The SCSI scanning code holds scan_mutex while creating sysfs
   attributes for a SCSI device. In this case scan_mutex is the outer
   mutex and s_active the inner locking object.
* scsi_remove_host() holds scan_mutex while removing sysfs attributes.
   Also in this case scan_mutex is the outer mutex and s_active the
   inner locking object.
* During self-removal (sysfs_remove_file_self() being called indirectly
   by kernfs_fop_write()), kernfs_fop_write() holds s_active while
   scsi_remove_device() is being called. In this case s_active is the
   outer locking object and scan_mutex the inner locking object.

I think that it is essential that kernfs_fop_write() holds s_active. So 
to me this looks like a lock inversion issue that cannot be fixed by 
modifying kernfs only. In other words, the SCSI core has to be modified 
to fix this. Do you agree with this?

Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 4.8.4: INFO: task kworker/u16:8:289 blocked for more than 120 seconds.

2016-10-30 Thread TomK

On 10/29/2016 5:44 PM, Nicholas A. Bellinger wrote:

On Sat, 2016-10-29 at 14:10 -0400, TomK wrote:

On 10/29/2016 3:50 AM, Nicholas A. Bellinger wrote:

Hi TomK & Co,

On Fri, 2016-10-28 at 02:01 -0400, TomK wrote:

On 10/26/2016 8:08 AM, TomK wrote:

On 10/26/2016 3:20 AM, Nicholas A. Bellinger wrote:








1) As soon as I re-add the bad disk without the patch, I loose the LUN
off the ESXi hosts.  Same thing happens with the patch.  No change.  The
disk is pulling things down.  Worse, the kernel panics and locks me out
of the system (http://microdevsys.com/linux-lio/messages-oct-27-2016.txt) :



So after groking these logs, the point when ata6 failing scsi_device is
holding outstanding I/O beyond ESX FC host side timeouts, manifests
itself as ABORT_TASK tag=1122276:

Oct 28 00:42:56 mbpc-pc kernel: qla2xxx [:04:00.0]-e872:20: 
qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 077e
Oct 28 00:42:56 mbpc-pc kernel: qla2xxx [:04:00.0]-e872:20: 
qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 077f
Oct 28 00:42:56 mbpc-pc kernel: qla2xxx [:04:00.0]-e872:20: 
qlt_24xx_atio_pkt_all_vps: qla_target(0): type 6 ox_id 0780
Oct 28 00:42:57 mbpc-pc kernel: qla2xxx [:04:00.0]-e837:20: ABTS_RECV_24XX: 
instance 0
Oct 28 00:42:57 mbpc-pc kernel: qla2xxx [:04:00.0]-f811:20: qla_target(0): 
task abort (s_id=1:5:0, tag=1122276, param=0)
Oct 28 00:42:57 mbpc-pc kernel: qla2xxx [:04:00.0]-f80f:20: qla_target(0): 
task abort (tag=1122276)
Oct 28 00:42:57 mbpc-pc kernel: ABORT_TASK: Found referenced qla2xxx task_tag: 
1122276

and eventually:

Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e837:20: ABTS_RECV_24XX: 
instance 0
Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-f811:20: qla_target(0): 
task abort (s_id=1:5:0, tag=1122276, param=0)
Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-f881:20: unable to find 
cmd in driver or LIO for tag 0x111fe4
Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-f854:20: qla_target(0): 
__qlt_24xx_handle_abts() failed: -2
Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task 
mgmt ABTS response (ha=88010f11, atio=88010f123a40, status=4
Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e838:20: ABTS_RESP_24XX: 
compl_status 31
Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e807:20: Sending retry 
TERM EXCH CTIO7 (ha=88010f11)
Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task 
mgmt ABTS response (ha=88010f11, atio=88010f123a80, status=0
Oct 28 00:43:18 mbpc-pc kernel: qla2xxx [:04:00.0]-e838:20: ABTS_RESP_24XX: 
compl_status 0

The outstanding se_cmd with tag=1122276 for ata6 completes back to
target-core, allowing ABORT_TASK + TMR_FUNCTION_COMPLETE status to
progress:

Oct 28 00:44:25 mbpc-pc kernel: hpet1: lost 9600 rtc interrupts
Oct 28 00:44:29 mbpc-pc kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 
SControl 310)
Oct 28 00:44:29 mbpc-pc kernel: ata6.00: configured for UDMA/133
Oct 28 00:44:29 mbpc-pc kernel: ata6.00: retrying FLUSH 0xea Emask 0x4
Oct 28 00:44:29 mbpc-pc kernel: ata6.00: device reported invalid CHS sector 0
Oct 28 00:44:29 mbpc-pc kernel: ata6: EH complete
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e81c:20: Sending TERM 
EXCH CTIO (ha=88010f11)
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f814:20: qla_target(0): 
terminating exchange for aborted cmd=880099392fa8 (se_cmd=880099392fa8, 
tag=1122276)
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e81c:20: Sending TERM 
EXCH CTIO (ha=88010f11)
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e851:20: qla_target(0): 
Terminating cmd 880099392fa8 with incorrect state 2
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f81d:20: Intermediate 
CTIO received (status 6)
Oct 28 00:44:30 mbpc-pc kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for 
ref_tag: 1122276
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f81d:20: Intermediate 
CTIO received (status 8)
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f813:20: TM response 
mcmd (8800b1e2f0f0) status 0x0 state 0x0
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task 
mgmt ABTS response (ha=88010f11, atio=8800b1e2f300, status=0
Oct 28 00:44:30 mbpc-pc kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for 
ref_tag: 1122996
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f813:20: TM response 
mcmd (8800b1e2f340) status 0x2 state 0x0
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task 
mgmt ABTS response (ha=88010f11, atio=8800b1e2f550, status=2
Oct 28 00:44:30 mbpc-pc kernel: ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for 
ref_tag: 1122204
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-f813:20: TM response 
mcmd (8800b1e2f590) status 0x2 state 0x0
Oct 28 00:44:30 mbpc-pc kernel: qla2xxx [:04:00.0]-e806:20: Sending task 
mgmt ABTS 

Re: [PATCH] [SCSI] mpt3sas: Fix secure erase premature termination.

2016-10-30 Thread Hannes Reinecke

On 10/30/2016 01:43 PM, Andrey Grodzovsky wrote:

Problem:
This is a work around for a bug with LSI Fusion MPT SAS2 when
pefroming secure erase. Due to the very long time the operation
takes commands issued during the erase will time out and will trigger
execution of abort hook. Even though the abort hook is called for
the specifc command which timed out this leads to entire device halt
(scsi_state terminated) and premature termination of the secured erase.

Actually, it is _not_ the erase command which times out, it's the 
successive commands which time out, as the controller is unable to 
process them while erase is running.
I suspect a bug in the SAT-layer from the mpt3sas firmware, which simply 
does not return 'busy' for additional commands when erase is in progress.
That being said, this issue was obscured prior to implementing 
asynchronous aborts, as originally a timeout would be invoking SCSI EH, 
which would wait for all outstanding commands to complete.
So by the time SCSI EH was invoked the erase command was already 
completed, allowing for a successful retry of the failing command.
With asynchronous aborts we don't have this option, as the abort will 
succeed, but the command cannot be retried as the original erase command 
is still running.


In the light of the above I guess we need something like the attached 
patch. I'm not utterly proud of if, but I guess it's the best we can do 
for the moment.


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
>From 1556746987c3b4c1a1a4705625280b1136554f89 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke 
Date: Sun, 30 Oct 2016 14:24:44 +0100
Subject: [PATCH] mpt3sas: hack: disable concurrent commands for ATA_16/ATA_12

There's a bug in the mpt3sas driver/firmware which would not return
BUSY if it's busy processing requests (eg 'erase') and cannot
respond to other commands. Hence these commands will timeout
and eventually start the error handler.
This patch disallows request processing whenever an ATA_12 or
ATA_16 command is received, thereby avoiding this problem.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 97987e7..18b9f09 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -4096,6 +4096,13 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
 	sas_device_priv_data->block)
 		return SCSI_MLQUEUE_DEVICE_BUSY;
 
+	/*
+	 * Hack: block the device for any ATA_12/ATA_16 command
+	 */
+	if (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85) {
+		sas_device_priv_data = scmd->device->hostdata;
+		_scsih_internal_device_block(scmd->device, sas_device_priv_data);
+	}
 	if (scmd->sc_data_direction == DMA_FROM_DEVICE)
 		mpi_control = MPI2_SCSIIO_CONTROL_READ;
 	else if (scmd->sc_data_direction == DMA_TO_DEVICE)
@@ -4835,6 +4842,10 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
 
  out:
 
+	if (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85) {
+		sas_device_priv_data = scmd->device->hostdata;
+		_scsih_internal_device_unblock(scmd->device, sas_device_priv_data);
+	}
 	scsi_dma_unmap(scmd);
 
 	scmd->scsi_done(scmd);
-- 
2.6.6



Re: [PATCH v2 02/16] scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly

2016-10-30 Thread Johannes Thumshirn
On Fri, Oct 28, 2016 at 11:53:46AM +0200, Steffen Maier wrote:

[...]

> > > 
> > > > @@ -3937,6 +3944,7 @@ fc_bsg_request_handler(struct request_queue *q, 
> > > > struct Scsi_Host *shost,
> > > > struct request *req;
> > > > struct fc_bsg_job *job;
> > > > enum fc_dispatch_result ret;
> > > > +   struct fc_bsg_reply *bsg_reply;
> > > > 
> > > > if (!get_device(dev))
> > > > return;
> > > > @@ -3973,8 +3981,9 @@ fc_bsg_request_handler(struct request_queue *q, 
> > > > struct Scsi_Host *shost,
> > > > /* check if we have the msgcode value at least */
> > > > if (job->request_len < sizeof(uint32_t)) {
> > > > BUG_ON(job->reply_len < sizeof(uint32_t));
> > > > -   job->reply->reply_payload_rcv_len = 0;
> > > > -   job->reply->result = -ENOMSG;
> > > > +   bsg_reply = job->reply;
> > > > +   bsg_reply->reply_payload_rcv_len = 0;
> > > > +   bsg_reply->result = -ENOMSG;
> 
> Compiler optimization re-ordered above two lines and the first pointer
> derefence is bsg_reply->result [field offset 0] where bsg_reply is NULL.
> The assignment tries to write to memory at address NULL causing the kernel
> page fault.
> 
> Does your suggested change for [PATCH v3 02/16], shuffling the
> job->request_len checks, address above kernel page fault?

This is what I hope at least. I'm sorry but I don't have any experience with 
s390
and zfcp at all. I still need to get a test environment set up, but all the
people knowing how to do are rather busy at the moment. All my tests on x86_64
with FCoE and lpfc haven't had a problem so far.

Thanks,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi_dh_alua: fix wrong scsi_device_put() in alua_rtpg_queue()

2016-10-30 Thread Hannes Reinecke

On 10/28/2016 05:05 PM, Bart Van Assche wrote:

On 10/28/2016 01:21 AM, tang.jun...@zte.com.cn wrote:

From: "tang.junhui" 

scsi_device_put() is called when the conditions pg->rtpg_sdev!=NULL
and queue_delayed_work() failure satisfied, actually it should be not
to call because scsi_device_get() is not called previous in this scene.

Signed-off-by: tang.junhui 
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 241829e..78081df 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -866,6 +866,7 @@ static void alua_rtpg_queue(struct alua_port_group *pg,
struct alua_queue_data *qdata, bool force)
 {
int start_queue = 0;
+   int sdev_geted = 0;
unsigned long flags;
struct workqueue_struct *alua_wq = kaluad_wq;

@@ -884,6 +885,7 @@ static void alua_rtpg_queue(struct alua_port_group *pg,
kref_get(>kref);
pg->rtpg_sdev = sdev;
scsi_device_get(sdev);
+   sdev_geted = 1;
start_queue = 1;
} else if (!(pg->flags & ALUA_PG_RUN_RTPG) && force) {
pg->flags |= ALUA_PG_RUN_RTPG;
@@ -901,7 +903,8 @@ static void alua_rtpg_queue(struct alua_port_group *pg,
if (start_queue &&
!queue_delayed_work(alua_wq, >rtpg_work,
msecs_to_jiffies(ALUA_RTPG_DELAY_MSECS))) {
-   scsi_device_put(sdev);
+   if (sdev_geted)
+   scsi_device_put(sdev);
kref_put(>kref, release_port_group);
}
 }


Hello Tang Junhui,

Personally I prefer the patch below because that patch does not introduce
a new variable. This is a patch I started testing four weeks ago and it has
survived all my ALUA tests so far. Please note that I do not know whether
my tests hit the !(pg->flags & ALUA_PG_RUNNING) code path.

From: Bart Van Assche 
Date: Thu, 29 Sep 2016 10:06:51 -0700
Subject: [PATCH] scsi_dh_alua: Fix a reference counting bug

Signed-off-by: Bart Van Assche 
Cc: 
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index 241829e..23835f7 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -890,6 +890,7 @@ static void alua_rtpg_queue(struct alua_port_group *pg,
/* Do not queue if the worker is already running */
if (!(pg->flags & ALUA_PG_RUNNING)) {
kref_get(>kref);
+   sdev = NULL;
start_queue = 1;
}
}
@@ -901,7 +902,8 @@ static void alua_rtpg_queue(struct alua_port_group *pg,
if (start_queue &&
!queue_delayed_work(alua_wq, >rtpg_work,
msecs_to_jiffies(ALUA_RTPG_DELAY_MSECS))) {
-   scsi_device_put(sdev);
+   if (sdev)
+   scsi_device_put(sdev);
kref_put(>kref, release_port_group);
}
 }


Yes, this looks far better.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [SCSI] mpt3sas: Fix secure erase premature termination.

2016-10-30 Thread Andrey Grodzovsky
Problem:
This is a work around for a bug with LSI Fusion MPT SAS2 when
pefroming secure erase. Due to the very long time the operation
takes commands issued during the erase will time out and will trigger
execution of abort hook. Even though the abort hook is called for
the specifc command which timed out this leads to entire device halt
(scsi_state terminated) and premature termination of the secured erase.

Fix:
Set device state to busy while erase in progress to reject any incoming
commands until the erase is done. The device is blocked any way during
this time and cannot execute any other command.
More data and logs can be found here -
https://drive.google.com/file/d/0B9ocOHYHbbS1Q3VMdkkzeWFkTjg/view

Signed-off-by: Andrey Grodzovsky 
Cc: 
Cc: Sathya Prakash 
Cc: Chaitra P B 
Cc: Suganath Prabu Subramani 
Cc: Sreekanth Reddy 
Cc: Hannes Reinecke 
Cc: 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git [PATCH]drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..5542dd02 100644
--- [PATCH]drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -3500,6 +3500,23 @@ _scsih_eedp_error_handling(struct scsi_cmnd *scmd, u16 
ioc_status)
SAM_STAT_CHECK_CONDITION;
 }
 
+/**
+ * This is a work around for a bug with LSI Fusion MPT SAS2 when
+ * pefroming secure erase. Due to the verly long time the operation
+ * takes commands issued during the erase will time out and will trigger
+ * execution of abort hook. This leads to device reset and premature
+ * termination of the secured erase.
+ */
+static inline bool disk_erase_command(struct scsi_cmnd *scmd)
+{
+   /**
+   * Identify secure erase command according to
+   * ATA/ATAPI Command Set (ATA8-ACS) p.202
+   */
+   return scmd->cmd_len == 16 && scmd->cmnd[14] == 0xf4;
+}
+
+
 
 /**
  * _scsih_qcmd - main scsi request entry point
@@ -3528,6 +3545,14 @@ _scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd 
*scmd)
scsi_print_command(scmd);
 #endif
 
+   /**
+   * Lock the device for any subsequent command until secured erase
+   * command is done.
+   */
+   if (disk_erase_command(scmd))
+   scsi_internal_device_block(scmd->device);
+
+
sas_device_priv_data = scmd->device->hostdata;
if (!sas_device_priv_data || !sas_device_priv_data->sas_target) {
scmd->result = DID_NO_CONNECT << 16;
@@ -4062,6 +4087,11 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
if (scmd == NULL)
return 1;
 
+   /* Secured erase is done. Unlock the device */
+   if (disk_erase_command(scmd))
+   scsi_internal_device_unblock(scmd->device, SDEV_RUNNING);
+
+
mpi_request = mpt3sas_base_get_msg_frame(ioc, smid);
 
if (mpi_reply == NULL) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] NCR5380: Use probe_irq_*() for IRQ probing

2016-10-30 Thread Ondrej Zary
Use standard probe_irq_on() and probe_irq_off() functions instead of own
implementation.
This prevents warning messages like this in the kernel log:
genirq: Flags mismatch irq 1.  (NCR-probe) vs. 0080 (i8042)

Move the IRQ trigger code to a separate function so it can be used for
other purposes (testing if the IRQ works).

Also add missing IRQ reset after probe.

Signed-off-by: Ondrej Zary 
---
 drivers/scsi/NCR5380.c   |   72 +-
 drivers/scsi/NCR5380.h   |3 +-
 drivers/scsi/g_NCR5380.c |2 +-
 3 files changed, 29 insertions(+), 48 deletions(-)

diff --git a/drivers/scsi/NCR5380.c b/drivers/scsi/NCR5380.c
index d849ffa..01c0027 100644
--- a/drivers/scsi/NCR5380.c
+++ b/drivers/scsi/NCR5380.c
@@ -351,50 +351,13 @@ static void NCR5380_print_phase(struct Scsi_Host 
*instance)
 }
 #endif
 
-
-static int probe_irq;
-
-/**
- * probe_intr  -   helper for IRQ autoprobe
- * @irq: interrupt number
- * @dev_id: unused
- * @regs: unused
- *
- * Set a flag to indicate the IRQ in question was received. This is
- * used by the IRQ probe code.
- */
-
-static irqreturn_t probe_intr(int irq, void *dev_id)
-{
-   probe_irq = irq;
-   return IRQ_HANDLED;
-}
-
-/**
- * NCR5380_probe_irq   -   find the IRQ of an NCR5380
- * @instance: NCR5380 controller
- * @possible: bitmask of ISA IRQ lines
- *
- * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ
- * and then looking to see what interrupt actually turned up.
- */
-
-static int __maybe_unused NCR5380_probe_irq(struct Scsi_Host *instance,
-   int possible)
+static void NCR5380_trigger_irq(struct Scsi_Host *instance)
 {
struct NCR5380_hostdata *hostdata = shost_priv(instance);
-   unsigned long timeout;
-   int trying_irqs, i, mask;
-
-   for (trying_irqs = 0, i = 1, mask = 2; i < 16; ++i, mask <<= 1)
-   if ((mask & possible) && (request_irq(i, _intr, 0, 
"NCR-probe", NULL) == 0))
-   trying_irqs |= mask;
-
-   timeout = jiffies + msecs_to_jiffies(250);
-   probe_irq = NO_IRQ;
+   unsigned long timeout = jiffies + msecs_to_jiffies(2500);
 
/*
-* A interrupt is triggered whenever BSY = false, SEL = true
+* An interrupt is triggered whenever BSY = false, SEL = true
 * and a bit set in the SELECT_ENABLE_REG is asserted on the
 * SCSI bus.
 *
@@ -402,23 +365,40 @@ static int __maybe_unused NCR5380_probe_irq(struct 
Scsi_Host *instance,
 * (I/O, C/D, and MSG) match those in the TCR, so we must reset that
 * to zero.
 */
-
NCR5380_write(TARGET_COMMAND_REG, 0);
NCR5380_write(SELECT_ENABLE_REG, hostdata->id_mask);
NCR5380_write(OUTPUT_DATA_REG, hostdata->id_mask);
NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_DATA | 
ICR_ASSERT_SEL);
 
-   while (probe_irq == NO_IRQ && time_before(jiffies, timeout))
+   while (!(NCR5380_read(BUS_AND_STATUS_REG) & BASR_IRQ) &&
+  time_before(jiffies, timeout))
schedule_timeout_uninterruptible(1);
 
NCR5380_write(SELECT_ENABLE_REG, 0);
NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
+}
 
-   for (i = 1, mask = 2; i < 16; ++i, mask <<= 1)
-   if (trying_irqs & mask)
-   free_irq(i, NULL);
+/**
+ * NCR5380_probe_irq   -   find the IRQ of an NCR5380
+ * @instance: NCR5380 controller
+ *
+ * Autoprobe for the IRQ line used by the NCR5380 by triggering an IRQ
+ * and then looking to see what interrupt actually turned up.
+ */
+
+static int NCR5380_probe_irq(struct Scsi_Host *instance)
+{
+   struct NCR5380_hostdata *hostdata = shost_priv(instance);
+   int irqs, irq;
+
+   irqs = probe_irq_on();
+   NCR5380_trigger_irq(instance);
+   irq = probe_irq_off(irqs);
+   NCR5380_read(RESET_PARITY_INTERRUPT_REG);
+   if (irq < 0)
+   irq = 0;
 
-   return probe_irq;
+   return irq;
 }
 
 /**
diff --git a/drivers/scsi/NCR5380.h b/drivers/scsi/NCR5380.h
index 3c6ce54..b2fca52 100644
--- a/drivers/scsi/NCR5380.h
+++ b/drivers/scsi/NCR5380.h
@@ -290,7 +290,8 @@ static inline struct scsi_cmnd *NCR5380_to_scmd(struct 
NCR5380_cmd *ncmd_ptr)
 #define NCR5380_dprint_phase(flg, arg) do {} while (0)
 #endif
 
-static int NCR5380_probe_irq(struct Scsi_Host *instance, int possible);
+static void NCR5380_trigger_irq(struct Scsi_Host *instance);
+static int NCR5380_probe_irq(struct Scsi_Host *instance);
 static int NCR5380_init(struct Scsi_Host *instance, int flags);
 static int NCR5380_maybe_reset_bus(struct Scsi_Host *);
 static void NCR5380_exit(struct Scsi_Host *instance);
diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 7299ad9..3790ed5 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -265,7 +265,7 @@ static int generic_NCR5380_init_one(struct 

[PATCH 2/2] g_NCR5380: Test the IRQ before accepting it

2016-10-30 Thread Ondrej Zary
Trigger an IRQ first with a test IRQ handler to find out if it really
works. Disable the IRQ if not.
This prevents hang when incorrect IRQ was specified by user.

Signed-off-by: Ondrej Zary 
---
 drivers/scsi/g_NCR5380.c |   31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/g_NCR5380.c b/drivers/scsi/g_NCR5380.c
index 3790ed5..f0c5b10 100644
--- a/drivers/scsi/g_NCR5380.c
+++ b/drivers/scsi/g_NCR5380.c
@@ -67,6 +67,14 @@
 MODULE_ALIAS("g_NCR5380_mmio");
 MODULE_LICENSE("GPL");
 
+static bool irq_working;
+
+static irqreturn_t test_irq(int irq, void *dev_id)
+{
+   irq_working = true;
+   return IRQ_HANDLED;
+}
+
 /*
  * Configure I/O address of 53C400A or DTC436 by writing magic numbers
  * to ports 0x779 and 0x379.
@@ -275,10 +283,29 @@ static int generic_NCR5380_init_one(struct 
scsi_host_template *tpnt,
/* set IRQ for HP C2502 */
if (board == BOARD_HP_C2502)
magic_configure(port_idx, instance->irq, magic);
-   if (request_irq(instance->irq, generic_NCR5380_intr,
-   0, "NCR5380", instance)) {
+   /* test if the IRQ is working */
+   irq_working = false;
+   if (request_irq(instance->irq, test_irq,
+   0, "NCR5380-irqtest", NULL)) {
printk(KERN_WARNING "scsi%d : IRQ%d not free, 
interrupts disabled\n", instance->host_no, instance->irq);
instance->irq = NO_IRQ;
+   } else {
+   NCR5380_trigger_irq(instance);
+   free_irq(instance->irq, NULL);
+   if (irq_working) {
+   if (request_irq(instance->irq,
+   generic_NCR5380_intr, 0,
+   "NCR5380", instance)) {
+   printk(KERN_WARNING "scsi%d : IRQ%d not 
free, interrupts disabled\n",
+  instance->host_no,
+  instance->irq);
+   instance->irq = NO_IRQ;
+   }
+   } else {
+   printk(KERN_WARNING "scsi%d : IRQ%d not 
working, interrupts disabled\n",
+  instance->host_no, instance->irq);
+   instance->irq = NO_IRQ;
+   }
}
}
 
-- 
Ondrej Zary

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html