RE: [PATCH 10/16] aacraid: Terminate kthread on controller fw assert
> -Original Message- > From: Johannes Thumshirn [mailto:jthumsh...@suse.de] > Sent: Thursday, February 16, 2017 1:31 AM > To: Raghava Aditya Renukunta > ; j...@linux.vnet.ibm.com; > martin.peter...@oracle.com; linux-scsi@vger.kernel.org > Cc: Dave Carroll ; Gana Sridaran > ; Scott Benesh > ; dan.carpen...@oracle.com > Subject: Re: [PATCH 10/16] aacraid: Terminate kthread on controller fw > assert > > EXTERNAL EMAIL > > > On 02/15/2017 11:22 PM, Raghava Aditya Renukunta wrote: > >> > >> This look a bit scary. Can't the kthread be converted to a workqueue so > >> we could call cancel_work_sync()? > > > > Could you please elaborate on the reasons why this fix is scary? > > I understand that killing a thread is not standard (for any reason), > > and if there are other nuanced issues I would like to understand them. > > I'm actually concerned that this could have all kinds of side effects. > But this is just a gut feeling. I see some drm drivers are doing the > same, so it might be possible, but IMHO this is not a good design. > > And IIRC kthreads do have more downsides (i.e. CPU hotplugging and > issues with kernel live patching). > > I think most kthreads (haven't looked too close to the aacraid kthread I > must admit, but I'll be doing so) can be converted to either workqueues > or timers (or a combination of both). > > Thanks, > Johannes Makes sense, and I agree. With that being said I will withdraw this patch and resend it out in different patch series once we rework aac_command_thread into a work queue/timers. Regards, Raghava Aditya > -- > Johannes Thumshirn Storage > jthumsh...@suse.de+49 911 74053 689 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH 10/16] aacraid: Terminate kthread on controller fw assert
On 02/15/2017 11:22 PM, Raghava Aditya Renukunta wrote: >> >> This look a bit scary. Can't the kthread be converted to a workqueue so >> we could call cancel_work_sync()? > > Could you please elaborate on the reasons why this fix is scary? > I understand that killing a thread is not standard (for any reason), > and if there are other nuanced issues I would like to understand them. I'm actually concerned that this could have all kinds of side effects. But this is just a gut feeling. I see some drm drivers are doing the same, so it might be possible, but IMHO this is not a good design. And IIRC kthreads do have more downsides (i.e. CPU hotplugging and issues with kernel live patching). I think most kthreads (haven't looked too close to the aacraid kthread I must admit, but I'll be doing so) can be converted to either workqueues or timers (or a combination of both). Thanks, Johannes -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
RE: [PATCH 10/16] aacraid: Terminate kthread on controller fw assert
> -Original Message- > From: Johannes Thumshirn [mailto:jthumsh...@suse.de] > Sent: Wednesday, February 15, 2017 12:44 AM > To: Raghava Aditya Renukunta > ; j...@linux.vnet.ibm.com; > martin.peter...@oracle.com; linux-scsi@vger.kernel.org > Cc: Dave Carroll ; Gana Sridaran > ; Scott Benesh > ; dan.carpen...@oracle.com > Subject: Re: [PATCH 10/16] aacraid: Terminate kthread on controller fw > assert > > EXTERNAL EMAIL > > > On 02/14/2017 09:44 PM, Raghava Aditya Renukunta wrote: > > When the command thread performs a periodic time sync and the > firmware is > > going through an assert during that time, the command thread waits for > the > > response that would never arrive. The SCSI Mid layer's error handler would > > eventually reset the controller, but the eh_handler just issues a > > "thread stop" to the command thread which is stuck on a semaphore and > the > > eh_thread would in turn goes to sleep waiting for the command_thread to > > respond to the stop which never happens. > > > > Fixed by allowing SIGTERM for the command thread, and during the > eh_reset > > call, sends termination signal to the command thread. As a follow-up, the > > eh_reset handler would take care of the controller reset. > > > > Signed-off-by: Raghava Aditya Renukunta > > > Reviewed-by: David Carroll > > --- > > This look a bit scary. Can't the kthread be converted to a workqueue so > we could call cancel_work_sync()? Could you please elaborate on the reasons why this fix is scary? I understand that killing a thread is not standard (for any reason), and if there are other nuanced issues I would like to understand them. Thank you, Raghava Aditya > > -- > Johannes Thumshirn Storage > jthumsh...@suse.de+49 911 74053 689 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH 10/16] aacraid: Terminate kthread on controller fw assert
On 02/14/2017 09:44 PM, Raghava Aditya Renukunta wrote: > When the command thread performs a periodic time sync and the firmware is > going through an assert during that time, the command thread waits for the > response that would never arrive. The SCSI Mid layer's error handler would > eventually reset the controller, but the eh_handler just issues a > "thread stop" to the command thread which is stuck on a semaphore and the > eh_thread would in turn goes to sleep waiting for the command_thread to > respond to the stop which never happens. > > Fixed by allowing SIGTERM for the command thread, and during the eh_reset > call, sends termination signal to the command thread. As a follow-up, the > eh_reset handler would take care of the controller reset. > > Signed-off-by: Raghava Aditya Renukunta > > Reviewed-by: David Carroll > --- This look a bit scary. Can't the kthread be converted to a workqueue so we could call cancel_work_sync()? -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850