Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation
Himanshu, > Thanks again for the patch. Testing was successful. > > Acked-by: Himanshu MadhaniApplied to 4.17/scsi-fixes. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation
Hi Ben, > On Mar 21, 2018, at 10:58 AM, Ben Hutchings> wrote: > > On Wed, 2018-03-21 at 17:55 +, Madhani, Himanshu wrote: >> Hi Ben, >> >>> On Mar 21, 2018, at 10:45 AM, Ben Hutchings >>> wrote: >>> >>> On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote: Hi Ben, > On Mar 20, 2018, at 2:36 PM, Ben Hutchings > wrote: > > qla2x00_init_timer() calls add_timer() on the iocb timeout timer, > which means the timeout function pointer and any data that the > function depends on must be initialised beforehand. > > Move this initialisation before each call to qla2x00_init_timer(). In > some cases qla2x00_init_timer() initialises a completion structure > needed by the timeout function, so move the call to add_timer() after > that. >>> >>> [...] What motivated this patch? are you debugging any crash which was helped by moving code around? >>> >>> I saw the recent fix that added a call to del_timer(), noticed the lack >>> of synchronisation and then kept looking further. >>> What I see from this patch is that its moving iocb.cmd.timeout field before qla2x00_init_timer(), which are completely different from each other. >>> >>> How are they "completely different"? qla2x00_init_timer() starts a >>> timer that will call qla2x00_sp_timeout(), which in turn uses the >>> iocb_cmd.timeout function pointer. We should not assume that any >>> initialisation code after the call to add_timer() will run before the >>> timer expires, since the scheduler might run other tasks for the whole >>> time. It's unlikely but not impossible. >>> >> >> I see. I used “completely different” due to the lack of better wording. >> >> I wanted to get context and understand if there was any issue that would >> have helped with these changes. >> >> your explanation does help and I agree that there is window where timer() >> will >> pop before timeout is initialized. >> >> Let me put this patch in our test cycle for couple days and will respond on >> this one > > Thanks. > > Ben. > > -- > Ben Hutchings > Software Developer, Codethink Ltd. Thanks again for the patch. Testing was successful. Acked-by: Himanshu Madhani Thanks, - Himanshu
Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation
Hi Ben, > On Mar 21, 2018, at 10:58 AM, Ben Hutchings> wrote: > > On Wed, 2018-03-21 at 17:55 +, Madhani, Himanshu wrote: >> Hi Ben, >> >>> On Mar 21, 2018, at 10:45 AM, Ben Hutchings >>> wrote: >>> >>> On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote: Hi Ben, > On Mar 20, 2018, at 2:36 PM, Ben Hutchings > wrote: > > qla2x00_init_timer() calls add_timer() on the iocb timeout timer, > which means the timeout function pointer and any data that the > function depends on must be initialised beforehand. > > Move this initialisation before each call to qla2x00_init_timer(). In > some cases qla2x00_init_timer() initialises a completion structure > needed by the timeout function, so move the call to add_timer() after > that. >>> >>> [...] What motivated this patch? are you debugging any crash which was helped by moving code around? >>> >>> I saw the recent fix that added a call to del_timer(), noticed the lack >>> of synchronisation and then kept looking further. >>> What I see from this patch is that its moving iocb.cmd.timeout field before qla2x00_init_timer(), which are completely different from each other. >>> >>> How are they "completely different"? qla2x00_init_timer() starts a >>> timer that will call qla2x00_sp_timeout(), which in turn uses the >>> iocb_cmd.timeout function pointer. We should not assume that any >>> initialisation code after the call to add_timer() will run before the >>> timer expires, since the scheduler might run other tasks for the whole >>> time. It's unlikely but not impossible. >>> >> >> I see. I used “completely different” due to the lack of better wording. >> >> I wanted to get context and understand if there was any issue that would >> have helped with these changes. >> >> your explanation does help and I agree that there is window where timer() >> will >> pop before timeout is initialized. >> >> Let me put this patch in our test cycle for couple days and will respond on >> this one > > Thanks. > Just to update, we are still going thru review and testing and will take some extra cycle to conclude. I will update as soon as we are able to reach conclusion. > Ben. > > -- > Ben Hutchings > Software Developer, Codethink Ltd. Thanks, - Himanshu
Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation
On Wed, 2018-03-21 at 17:55 +, Madhani, Himanshu wrote: > Hi Ben, > > > On Mar 21, 2018, at 10:45 AM, Ben Hutchings> > wrote: > > > > On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote: > > > Hi Ben, > > > > > > > On Mar 20, 2018, at 2:36 PM, Ben Hutchings > > > > wrote: > > > > > > > > qla2x00_init_timer() calls add_timer() on the iocb timeout timer, > > > > which means the timeout function pointer and any data that the > > > > function depends on must be initialised beforehand. > > > > > > > > Move this initialisation before each call to qla2x00_init_timer(). In > > > > some cases qla2x00_init_timer() initialises a completion structure > > > > needed by the timeout function, so move the call to add_timer() after > > > > that. > > > > [...] > > > What motivated this patch? are you debugging any crash which was > > > helped by moving code around? > > > > I saw the recent fix that added a call to del_timer(), noticed the lack > > of synchronisation and then kept looking further. > > > > > What I see from this patch is that its moving iocb.cmd.timeout field > > > before qla2x00_init_timer(), > > > which are completely different from each other. > > > > How are they "completely different"? qla2x00_init_timer() starts a > > timer that will call qla2x00_sp_timeout(), which in turn uses the > > iocb_cmd.timeout function pointer. We should not assume that any > > initialisation code after the call to add_timer() will run before the > > timer expires, since the scheduler might run other tasks for the whole > > time. It's unlikely but not impossible. > > > > I see. I used “completely different” due to the lack of better wording. > > I wanted to get context and understand if there was any issue that would > have helped with these changes. > > your explanation does help and I agree that there is window where timer() > will > pop before timeout is initialized. > > Let me put this patch in our test cycle for couple days and will respond on > this one Thanks. Ben. -- Ben Hutchings Software Developer, Codethink Ltd.
Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation
Hi Ben, > On Mar 21, 2018, at 10:45 AM, Ben Hutchings> wrote: > > On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote: >> Hi Ben, >> >>> On Mar 20, 2018, at 2:36 PM, Ben Hutchings >>> wrote: >>> >>> qla2x00_init_timer() calls add_timer() on the iocb timeout timer, >>> which means the timeout function pointer and any data that the >>> function depends on must be initialised beforehand. >>> >>> Move this initialisation before each call to qla2x00_init_timer(). In >>> some cases qla2x00_init_timer() initialises a completion structure >>> needed by the timeout function, so move the call to add_timer() after >>> that. > [...] >> What motivated this patch? are you debugging any crash which was >> helped by moving code around? > > I saw the recent fix that added a call to del_timer(), noticed the lack > of synchronisation and then kept looking further. > >> What I see from this patch is that its moving iocb.cmd.timeout field >> before qla2x00_init_timer(), >> which are completely different from each other. > > How are they "completely different"? qla2x00_init_timer() starts a > timer that will call qla2x00_sp_timeout(), which in turn uses the > iocb_cmd.timeout function pointer. We should not assume that any > initialisation code after the call to add_timer() will run before the > timer expires, since the scheduler might run other tasks for the whole > time. It's unlikely but not impossible. > I see. I used “completely different” due to the lack of better wording. I wanted to get context and understand if there was any issue that would have helped with these changes. your explanation does help and I agree that there is window where timer() will pop before timeout is initialized. Let me put this patch in our test cycle for couple days and will respond on this one > Ben. > > -- > Ben Hutchings > Software Developer, Codethink Ltd. > Thanks, - Himanshu
Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation
On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote: > Hi Ben, > > > On Mar 20, 2018, at 2:36 PM, Ben Hutchings> > wrote: > > > > qla2x00_init_timer() calls add_timer() on the iocb timeout timer, > > which means the timeout function pointer and any data that the > > function depends on must be initialised beforehand. > > > > Move this initialisation before each call to qla2x00_init_timer(). In > > some cases qla2x00_init_timer() initialises a completion structure > > needed by the timeout function, so move the call to add_timer() after > > that. [...] > What motivated this patch? are you debugging any crash which was > helped by moving code around? I saw the recent fix that added a call to del_timer(), noticed the lack of synchronisation and then kept looking further. > What I see from this patch is that its moving iocb.cmd.timeout field > before qla2x00_init_timer(), > which are completely different from each other. How are they "completely different"? qla2x00_init_timer() starts a timer that will call qla2x00_sp_timeout(), which in turn uses the iocb_cmd.timeout function pointer. We should not assume that any initialisation code after the call to add_timer() will run before the timer expires, since the scheduler might run other tasks for the whole time. It's unlikely but not impossible. Ben. -- Ben Hutchings Software Developer, Codethink Ltd.
Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation
Hi Ben, > On Mar 20, 2018, at 2:36 PM, Ben Hutchings> wrote: > > qla2x00_init_timer() calls add_timer() on the iocb timeout timer, > which means the timeout function pointer and any data that the > function depends on must be initialised beforehand. > > Move this initialisation before each call to qla2x00_init_timer(). In > some cases qla2x00_init_timer() initialises a completion structure > needed by the timeout function, so move the call to add_timer() after > that. > > Signed-off-by: Ben Hutchings > --- > Changes from v1: > > Rebased to remove textual dependency on a patch I haven't yet sent > (oops). > > Ben. > > drivers/scsi/qla2xxx/qla_gs.c | 12 +++- > drivers/scsi/qla2xxx/qla_init.c | 37 +++-- > drivers/scsi/qla2xxx/qla_inline.h | 2 +- > drivers/scsi/qla2xxx/qla_iocb.c | 8 +--- > drivers/scsi/qla2xxx/qla_mbx.c| 8 > drivers/scsi/qla2xxx/qla_mid.c| 2 +- > drivers/scsi/qla2xxx/qla_mr.c | 5 +++-- > drivers/scsi/qla2xxx/qla_target.c | 2 +- > 8 files changed, 45 insertions(+), 31 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c > index 403fa096f8c8..fcde5ea203c0 100644 > --- a/drivers/scsi/qla2xxx/qla_gs.c > +++ b/drivers/scsi/qla2xxx/qla_gs.c > @@ -3796,6 +3796,7 @@ int qla24xx_async_gffid(scsi_qla_host_t *vha, fc_port_t > *fcport) > sp->gen1 = fcport->rscn_gen; > sp->gen2 = fcport->login_gen; > > + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2); > > /* CT_IU preamble */ > @@ -3814,7 +3815,6 @@ int qla24xx_async_gffid(scsi_qla_host_t *vha, fc_port_t > *fcport) > sp->u.iocb_cmd.u.ctarg.rsp_size = GFF_ID_RSP_SIZE; > sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS; > > - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > sp->done = qla24xx_async_gffid_sp_done; > > rval = qla2x00_start_sp(sp); > @@ -4121,6 +4121,8 @@ static int qla24xx_async_gnnft(scsi_qla_host_t *vha, > struct srb *sp, > sp->name = "gnnft"; > sp->gen1 = vha->hw->base_qpair->chip_reset; > sp->gen2 = fc4_type; > + > + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2); > > memset(sp->u.iocb_cmd.u.ctarg.rsp, 0, sp->u.iocb_cmd.u.ctarg.rsp_size); > @@ -4137,7 +4139,6 @@ static int qla24xx_async_gnnft(scsi_qla_host_t *vha, > struct srb *sp, > sp->u.iocb_cmd.u.ctarg.req_size = GNN_FT_REQ_SIZE; > sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS; > > - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > sp->done = qla2x00_async_gpnft_gnnft_sp_done; > > rval = qla2x00_start_sp(sp); > @@ -4210,6 +4211,8 @@ int qla24xx_async_gpnft(scsi_qla_host_t *vha, u8 > fc4_type) > sp->name = "gpnft"; > sp->gen1 = vha->hw->base_qpair->chip_reset; > sp->gen2 = fc4_type; > + > + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2); > > sp->u.iocb_cmd.u.ctarg.req = dma_zalloc_coherent(>hw->pdev->dev, > @@ -4248,7 +4251,6 @@ int qla24xx_async_gpnft(scsi_qla_host_t *vha, u8 > fc4_type) > sp->u.iocb_cmd.u.ctarg.rsp_size = rspsz; > sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS; > > - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > sp->done = qla2x00_async_gpnft_gnnft_sp_done; > > rval = qla2x00_start_sp(sp); > @@ -4356,6 +4358,7 @@ int qla24xx_async_gnnid(scsi_qla_host_t *vha, fc_port_t > *fcport) > sp->gen1 = fcport->rscn_gen; > sp->gen2 = fcport->login_gen; > > + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2); > > /* CT_IU preamble */ > @@ -4377,7 +4380,6 @@ int qla24xx_async_gnnid(scsi_qla_host_t *vha, fc_port_t > *fcport) > sp->u.iocb_cmd.u.ctarg.rsp_size = GNN_ID_RSP_SIZE; > sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS; > > - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > sp->done = qla2x00_async_gnnid_sp_done; > > rval = qla2x00_start_sp(sp); > @@ -4493,6 +4495,7 @@ int qla24xx_async_gfpnid(scsi_qla_host_t *vha, > fc_port_t *fcport) > sp->gen1 = fcport->rscn_gen; > sp->gen2 = fcport->login_gen; > > + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2); > > /* CT_IU preamble */ > @@ -4514,7 +4517,6 @@ int qla24xx_async_gfpnid(scsi_qla_host_t *vha, > fc_port_t *fcport) > sp->u.iocb_cmd.u.ctarg.rsp_size = GFPN_ID_RSP_SIZE; > sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS; > > - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout; > sp->done = qla2x00_async_gfpnid_sp_done; > > rval = qla2x00_start_sp(sp); > diff --git