Re: [PATCH v2] Avoid that ATA error handling can trigger a kernel hang or oops

2018-02-27 Thread Martin K. Petersen

Bart,

> Avoid that the recently introduced call_rcu() call in the SCSI core
> triggers a double call_rcu() call.

Applied to 4.16/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] Avoid that ATA error handling can trigger a kernel hang or oops

2018-02-27 Thread Damien Le Moal
On 2018/02/27 10:53, Bart Van Assche wrote:
> On Thu, 2018-02-22 at 11:30 -0800, Bart Van Assche wrote:
>> Avoid that the recently introduced call_rcu() call in the SCSI core
>> triggers a double call_rcu() call.
>> [ ... ]
> 
> Can anyone review this patch? Multiple users have confirmed independently
> that this patch fixes the double call_rcu() issue for them.
> 
> Thanks,
> 
> Bart.

Please feel free to add:

Reviewed-by: Damien Le Moal 
Tested-by: Damien Le Moal 


-- 
Damien Le Moal
Western Digital Research

Re: [PATCH v2] Avoid that ATA error handling can trigger a kernel hang or oops

2018-02-27 Thread Bart Van Assche
On Thu, 2018-02-22 at 11:30 -0800, Bart Van Assche wrote:
> Avoid that the recently introduced call_rcu() call in the SCSI core
> triggers a double call_rcu() call.
> [ ... ]

Can anyone review this patch? Multiple users have confirmed independently
that this patch fixes the double call_rcu() issue for them.

Thanks,

Bart.





Re: [PATCH v2] Avoid that ATA error handling can trigger a kernel hang or oops

2018-02-26 Thread Salvatore Bonaccorso
Hi,

On Fri, Feb 23, 2018 at 04:46:30PM +0100, Natanael Copa wrote:
> On Thu, 22 Feb 2018 11:30:20 -0800
> Bart Van Assche  wrote:
> 
> > Avoid that the recently introduced call_rcu() call in the SCSI core
> > triggers a double call_rcu() call.
> 
> This patch also prevents my machine from hanging. However, the ATA
> error messages that I previously have had are gone, so I don't know if
> it is because the underlying problem is gone, or if the error messages
> are simply not reported.

For reference, the patch has as well been confirmed to solve the
problem for Jean-Francois Pirus in https://bugs.debian.org/891467
(on top of 4.15.4)

Regards,
Salvatore


Re: [PATCH v2] Avoid that ATA error handling can trigger a kernel hang or oops

2018-02-23 Thread Natanael Copa
On Thu, 22 Feb 2018 11:30:20 -0800
Bart Van Assche  wrote:

> Avoid that the recently introduced call_rcu() call in the SCSI core
> triggers a double call_rcu() call.

This patch also prevents my machine from hanging. However, the ATA
error messages that I previously have had are gone, so I don't know if
it is because the underlying problem is gone, or if the error messages
are simply not reported.

-nc

> 
> Reported-by: Natanael Copa 
> Reported-by: Damien Le Moal 
> References: https://bugzilla.kernel.org/show_bug.cgi?id=198861
> Fixes: 3bd6f43f5cb3 ("scsi: core: Ensure that the SCSI error handler gets 
> woken up")
> Signed-off-by: Bart Van Assche 
> Cc: Natanael Copa 
> Cc: Damien Le Moal 
> Cc: Alexandre Oliva 
> Cc: Pavel Tikhomirov 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> Cc: 
> ---
>  drivers/scsi/hosts.c  | 3 ---
>  drivers/scsi/scsi_error.c | 5 +++--
>  drivers/scsi/scsi_lib.c   | 2 ++
>  include/scsi/scsi_cmnd.h  | 3 +++
>  include/scsi/scsi_host.h  | 2 --
>  5 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index a0a7e4ff255c..7279d3d2e941 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -334,8 +334,6 @@ static void scsi_host_dev_release(struct device *dev)
>   if (shost->work_q)
>   destroy_workqueue(shost->work_q);
>  
> - destroy_rcu_head(>rcu);
> -
>   if (shost->shost_state == SHOST_CREATED) {
>   /*
>* Free the shost_dev device name here if scsi_host_alloc()
> @@ -404,7 +402,6 @@ struct Scsi_Host *scsi_host_alloc(struct 
> scsi_host_template *sht, int privsize)
>   INIT_LIST_HEAD(>starved_list);
>   init_waitqueue_head(>host_wait);
>   mutex_init(>scan_mutex);
> - init_rcu_head(>rcu);
>  
>   index = ida_simple_get(_index_ida, 0, 0, GFP_KERNEL);
>   if (index < 0)
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 96f988a7efda..9b0242f84407 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -229,7 +229,8 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
>  
>  static void scsi_eh_inc_host_failed(struct rcu_head *head)
>  {
> - struct Scsi_Host *shost = container_of(head, typeof(*shost), rcu);
> + struct scsi_cmnd *scmd = container_of(head, typeof(*scmd), rcu);
> + struct Scsi_Host *shost = scmd->device->host;
>   unsigned long flags;
>  
>   spin_lock_irqsave(shost->host_lock, flags);
> @@ -265,7 +266,7 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>* Ensure that all tasks observe the host state change before the
>* host_failed change.
>*/
> - call_rcu(>rcu, scsi_eh_inc_host_failed);
> + call_rcu(>rcu, scsi_eh_inc_host_failed);
>  }
>  
>  /**
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index aea5a1ae318b..e1ca2160aa40 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -671,6 +671,7 @@ static bool scsi_end_request(struct request *req, 
> blk_status_t error,
>   if (!blk_rq_is_scsi(req)) {
>   WARN_ON_ONCE(!(cmd->flags & SCMD_INITIALIZED));
>   cmd->flags &= ~SCMD_INITIALIZED;
> + destroy_rcu_head(>rcu);
>   }
>  
>   if (req->mq_ctx) {
> @@ -1151,6 +1152,7 @@ static void scsi_initialize_rq(struct request *rq)
>   struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
>  
>   scsi_req_init(>req);
> + init_rcu_head(>rcu);
>   cmd->jiffies_at_alloc = jiffies;
>   cmd->retries = 0;
>  }
> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index d8d4a902a88d..2280b2351739 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -68,6 +68,9 @@ struct scsi_cmnd {
>   struct list_head list;  /* scsi_cmnd participates in queue lists */
>   struct list_head eh_entry; /* entry for the host eh_cmd_q */
>   struct delayed_work abort_work;
> +
> + struct rcu_head rcu;
> +
>   int eh_eflags;  /* Used by error handlr */
>  
>   /*
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 19317585ae48..7aedb6776b4f 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -577,8 +577,6 @@ struct Scsi_Host {
>   struct blk_mq_tag_set   tag_set;
>   };
>  
> - struct rcu_head rcu;
> -
>   atomic_t host_busy;/* commands actually active on 
> low-level */
>   atomic_t host_blocked;
>  



[PATCH v2] Avoid that ATA error handling can trigger a kernel hang or oops

2018-02-22 Thread Bart Van Assche
Avoid that the recently introduced call_rcu() call in the SCSI core
triggers a double call_rcu() call.

Reported-by: Natanael Copa 
Reported-by: Damien Le Moal 
References: https://bugzilla.kernel.org/show_bug.cgi?id=198861
Fixes: 3bd6f43f5cb3 ("scsi: core: Ensure that the SCSI error handler gets woken 
up")
Signed-off-by: Bart Van Assche 
Cc: Natanael Copa 
Cc: Damien Le Moal 
Cc: Alexandre Oliva 
Cc: Pavel Tikhomirov 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
Cc: 
---
 drivers/scsi/hosts.c  | 3 ---
 drivers/scsi/scsi_error.c | 5 +++--
 drivers/scsi/scsi_lib.c   | 2 ++
 include/scsi/scsi_cmnd.h  | 3 +++
 include/scsi/scsi_host.h  | 2 --
 5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index a0a7e4ff255c..7279d3d2e941 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -334,8 +334,6 @@ static void scsi_host_dev_release(struct device *dev)
if (shost->work_q)
destroy_workqueue(shost->work_q);
 
-   destroy_rcu_head(>rcu);
-
if (shost->shost_state == SHOST_CREATED) {
/*
 * Free the shost_dev device name here if scsi_host_alloc()
@@ -404,7 +402,6 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template 
*sht, int privsize)
INIT_LIST_HEAD(>starved_list);
init_waitqueue_head(>host_wait);
mutex_init(>scan_mutex);
-   init_rcu_head(>rcu);
 
index = ida_simple_get(_index_ida, 0, 0, GFP_KERNEL);
if (index < 0)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 96f988a7efda..9b0242f84407 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -229,7 +229,8 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
 
 static void scsi_eh_inc_host_failed(struct rcu_head *head)
 {
-   struct Scsi_Host *shost = container_of(head, typeof(*shost), rcu);
+   struct scsi_cmnd *scmd = container_of(head, typeof(*scmd), rcu);
+   struct Scsi_Host *shost = scmd->device->host;
unsigned long flags;
 
spin_lock_irqsave(shost->host_lock, flags);
@@ -265,7 +266,7 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 * Ensure that all tasks observe the host state change before the
 * host_failed change.
 */
-   call_rcu(>rcu, scsi_eh_inc_host_failed);
+   call_rcu(>rcu, scsi_eh_inc_host_failed);
 }
 
 /**
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index aea5a1ae318b..e1ca2160aa40 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -671,6 +671,7 @@ static bool scsi_end_request(struct request *req, 
blk_status_t error,
if (!blk_rq_is_scsi(req)) {
WARN_ON_ONCE(!(cmd->flags & SCMD_INITIALIZED));
cmd->flags &= ~SCMD_INITIALIZED;
+   destroy_rcu_head(>rcu);
}
 
if (req->mq_ctx) {
@@ -1151,6 +1152,7 @@ static void scsi_initialize_rq(struct request *rq)
struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
 
scsi_req_init(>req);
+   init_rcu_head(>rcu);
cmd->jiffies_at_alloc = jiffies;
cmd->retries = 0;
 }
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index d8d4a902a88d..2280b2351739 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -68,6 +68,9 @@ struct scsi_cmnd {
struct list_head list;  /* scsi_cmnd participates in queue lists */
struct list_head eh_entry; /* entry for the host eh_cmd_q */
struct delayed_work abort_work;
+
+   struct rcu_head rcu;
+
int eh_eflags;  /* Used by error handlr */
 
/*
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 19317585ae48..7aedb6776b4f 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -577,8 +577,6 @@ struct Scsi_Host {
struct blk_mq_tag_set   tag_set;
};
 
-   struct rcu_head rcu;
-
atomic_t host_busy;/* commands actually active on 
low-level */
atomic_t host_blocked;
 
-- 
2.16.2