RE: [PATCH 10/16] aacraid: Terminate kthread on controller fw assert
> -Original Message- > From: Johannes Thumshirn [mailto:jthumsh...@suse.de] > Sent: Thursday, February 16, 2017 1:31 AM > To: Raghava Aditya Renukunta > ; j...@linux.vnet.ibm.com; > martin.peter...@oracle.com; linux-scsi@vger.kernel.org > Cc: Dave Carroll ; Gana Sridaran > ; Scott Benesh > ; dan.carpen...@oracle.com > Subject: Re: [PATCH 10/16] aacraid: Terminate kthread on controller fw > assert > > EXTERNAL EMAIL > > > On 02/15/2017 11:22 PM, Raghava Aditya Renukunta wrote: > >> > >> This look a bit scary. Can't the kthread be converted to a workqueue so > >> we could call cancel_work_sync()? > > > > Could you please elaborate on the reasons why this fix is scary? > > I understand that killing a thread is not standard (for any reason), > > and if there are other nuanced issues I would like to understand them. > > I'm actually concerned that this could have all kinds of side effects. > But this is just a gut feeling. I see some drm drivers are doing the > same, so it might be possible, but IMHO this is not a good design. > > And IIRC kthreads do have more downsides (i.e. CPU hotplugging and > issues with kernel live patching). > > I think most kthreads (haven't looked too close to the aacraid kthread I > must admit, but I'll be doing so) can be converted to either workqueues > or timers (or a combination of both). > > Thanks, > Johannes Makes sense, and I agree. With that being said I will withdraw this patch and resend it out in different patch series once we rework aac_command_thread into a work queue/timers. Regards, Raghava Aditya > -- > Johannes Thumshirn Storage > jthumsh...@suse.de+49 911 74053 689 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH 10/16] aacraid: Terminate kthread on controller fw assert
On 02/15/2017 11:22 PM, Raghava Aditya Renukunta wrote: >> >> This look a bit scary. Can't the kthread be converted to a workqueue so >> we could call cancel_work_sync()? > > Could you please elaborate on the reasons why this fix is scary? > I understand that killing a thread is not standard (for any reason), > and if there are other nuanced issues I would like to understand them. I'm actually concerned that this could have all kinds of side effects. But this is just a gut feeling. I see some drm drivers are doing the same, so it might be possible, but IMHO this is not a good design. And IIRC kthreads do have more downsides (i.e. CPU hotplugging and issues with kernel live patching). I think most kthreads (haven't looked too close to the aacraid kthread I must admit, but I'll be doing so) can be converted to either workqueues or timers (or a combination of both). Thanks, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
RE: [PATCH 10/16] aacraid: Terminate kthread on controller fw assert
> -Original Message- > From: Johannes Thumshirn [mailto:jthumsh...@suse.de] > Sent: Wednesday, February 15, 2017 12:44 AM > To: Raghava Aditya Renukunta > ; j...@linux.vnet.ibm.com; > martin.peter...@oracle.com; linux-scsi@vger.kernel.org > Cc: Dave Carroll ; Gana Sridaran > ; Scott Benesh > ; dan.carpen...@oracle.com > Subject: Re: [PATCH 10/16] aacraid: Terminate kthread on controller fw > assert > > EXTERNAL EMAIL > > > On 02/14/2017 09:44 PM, Raghava Aditya Renukunta wrote: > > When the command thread performs a periodic time sync and the > firmware is > > going through an assert during that time, the command thread waits for > the > > response that would never arrive. The SCSI Mid layer's error handler would > > eventually reset the controller, but the eh_handler just issues a > > "thread stop" to the command thread which is stuck on a semaphore and > the > > eh_thread would in turn goes to sleep waiting for the command_thread to > > respond to the stop which never happens. > > > > Fixed by allowing SIGTERM for the command thread, and during the > eh_reset > > call, sends termination signal to the command thread. As a follow-up, the > > eh_reset handler would take care of the controller reset. > > > > Signed-off-by: Raghava Aditya Renukunta > > > Reviewed-by: David Carroll > > --- > > This look a bit scary. Can't the kthread be converted to a workqueue so > we could call cancel_work_sync()? Could you please elaborate on the reasons why this fix is scary? I understand that killing a thread is not standard (for any reason), and if there are other nuanced issues I would like to understand them. Thank you, Raghava Aditya > > -- > Johannes Thumshirn Storage > jthumsh...@suse.de+49 911 74053 689 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH 10/16] aacraid: Terminate kthread on controller fw assert
On 02/14/2017 09:44 PM, Raghava Aditya Renukunta wrote: > When the command thread performs a periodic time sync and the firmware is > going through an assert during that time, the command thread waits for the > response that would never arrive. The SCSI Mid layer's error handler would > eventually reset the controller, but the eh_handler just issues a > "thread stop" to the command thread which is stuck on a semaphore and the > eh_thread would in turn goes to sleep waiting for the command_thread to > respond to the stop which never happens. > > Fixed by allowing SIGTERM for the command thread, and during the eh_reset > call, sends termination signal to the command thread. As a follow-up, the > eh_reset handler would take care of the controller reset. > > Signed-off-by: Raghava Aditya Renukunta > > Reviewed-by: David Carroll > --- This look a bit scary. Can't the kthread be converted to a workqueue so we could call cancel_work_sync()? -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
[PATCH 10/16] aacraid: Terminate kthread on controller fw assert
When the command thread performs a periodic time sync and the firmware is going through an assert during that time, the command thread waits for the response that would never arrive. The SCSI Mid layer's error handler would eventually reset the controller, but the eh_handler just issues a "thread stop" to the command thread which is stuck on a semaphore and the eh_thread would in turn goes to sleep waiting for the command_thread to respond to the stop which never happens. Fixed by allowing SIGTERM for the command thread, and during the eh_reset call, sends termination signal to the command thread. As a follow-up, the eh_reset handler would take care of the controller reset. Signed-off-by: Raghava Aditya Renukunta Reviewed-by: David Carroll --- drivers/scsi/aacraid/commsup.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c index 78588e4..0ee91d0 100644 --- a/drivers/scsi/aacraid/commsup.c +++ b/drivers/scsi/aacraid/commsup.c @@ -1519,8 +1519,15 @@ static int _aac_reset_adapter(struct aac_dev *aac, int forced, u8 reset_type) scsi_block_requests(host); aac_adapter_disable_int(aac); if (aac->thread->pid != current->pid) { + struct task_struct *tsk; + spin_unlock_irq(host->host_lock); + tsk = pid_task(find_vpid(aac->thread->pid), PIDTYPE_PID); + if (tsk) + send_sig(SIGTERM, tsk, 1); kthread_stop(aac->thread); + + dev_info(&aac->pdev->dev, "Command Thread Terminated\n"); jafo = 1; } -- 2.7.4