Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation

2018-04-09 Thread Martin K. Petersen

Himanshu,

> Thanks again for the patch. Testing was successful. 
>
> Acked-by: Himanshu Madhani 

Applied to 4.17/scsi-fixes. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation

2018-03-29 Thread Madhani, Himanshu
Hi Ben, 

> On Mar 21, 2018, at 10:58 AM, Ben Hutchings  
> wrote:
> 
> On Wed, 2018-03-21 at 17:55 +, Madhani, Himanshu wrote:
>> Hi Ben, 
>> 
>>> On Mar 21, 2018, at 10:45 AM, Ben Hutchings  
>>> wrote:
>>> 
>>> On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote:
 Hi Ben, 
 
> On Mar 20, 2018, at 2:36 PM, Ben Hutchings 
>  wrote:
> 
> qla2x00_init_timer() calls add_timer() on the iocb timeout timer,
> which means the timeout function pointer and any data that the
> function depends on must be initialised beforehand.
> 
> Move this initialisation before each call to qla2x00_init_timer().  In
> some cases qla2x00_init_timer() initialises a completion structure
> needed by the timeout function, so move the call to add_timer() after
> that.
>>> 
>>> [...]
 What motivated this patch? are you debugging any crash which was
 helped by moving code around? 
>>> 
>>> I saw the recent fix that added a call to del_timer(), noticed the lack
>>> of synchronisation and then kept looking further.
>>> 
 What I see from this patch is that its moving iocb.cmd.timeout field
 before qla2x00_init_timer(),
 which are completely different from each other. 
>>> 
>>> How are they "completely different"?  qla2x00_init_timer() starts a
>>> timer that will call qla2x00_sp_timeout(), which in turn uses the
>>> iocb_cmd.timeout function pointer.  We should not assume that any
>>> initialisation code after the call to add_timer() will run before the
>>> timer expires, since the scheduler might run other tasks for the whole
>>> time.  It's unlikely but not impossible.
>>> 
>> 
>> I see. I used “completely different” due to the lack of better wording. 
>> 
>> I wanted to get context and understand if there was any issue that would
>> have helped with these changes. 
>> 
>> your explanation does help and I agree that there is window where timer() 
>> will 
>> pop before timeout is initialized. 
>> 
>> Let me put this patch in our test cycle for couple days and will respond on 
>> this one
> 
> Thanks.
> 
> Ben.
> 
> -- 
> Ben Hutchings
> Software Developer, Codethink Ltd.

Thanks again for the patch. Testing was successful. 

Acked-by: Himanshu Madhani 

Thanks,
- Himanshu



Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation

2018-03-27 Thread Madhani, Himanshu
Hi Ben, 

> On Mar 21, 2018, at 10:58 AM, Ben Hutchings  
> wrote:
> 
> On Wed, 2018-03-21 at 17:55 +, Madhani, Himanshu wrote:
>> Hi Ben, 
>> 
>>> On Mar 21, 2018, at 10:45 AM, Ben Hutchings  
>>> wrote:
>>> 
>>> On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote:
 Hi Ben, 
 
> On Mar 20, 2018, at 2:36 PM, Ben Hutchings 
>  wrote:
> 
> qla2x00_init_timer() calls add_timer() on the iocb timeout timer,
> which means the timeout function pointer and any data that the
> function depends on must be initialised beforehand.
> 
> Move this initialisation before each call to qla2x00_init_timer().  In
> some cases qla2x00_init_timer() initialises a completion structure
> needed by the timeout function, so move the call to add_timer() after
> that.
>>> 
>>> [...]
 What motivated this patch? are you debugging any crash which was
 helped by moving code around? 
>>> 
>>> I saw the recent fix that added a call to del_timer(), noticed the lack
>>> of synchronisation and then kept looking further.
>>> 
 What I see from this patch is that its moving iocb.cmd.timeout field
 before qla2x00_init_timer(),
 which are completely different from each other. 
>>> 
>>> How are they "completely different"?  qla2x00_init_timer() starts a
>>> timer that will call qla2x00_sp_timeout(), which in turn uses the
>>> iocb_cmd.timeout function pointer.  We should not assume that any
>>> initialisation code after the call to add_timer() will run before the
>>> timer expires, since the scheduler might run other tasks for the whole
>>> time.  It's unlikely but not impossible.
>>> 
>> 
>> I see. I used “completely different” due to the lack of better wording. 
>> 
>> I wanted to get context and understand if there was any issue that would
>> have helped with these changes. 
>> 
>> your explanation does help and I agree that there is window where timer() 
>> will 
>> pop before timeout is initialized. 
>> 
>> Let me put this patch in our test cycle for couple days and will respond on 
>> this one
> 
> Thanks.
> 

Just to update, we are still going thru review and testing and will take some 
extra cycle to conclude. 
I will update as soon as we are able to reach conclusion.

> Ben.
> 
> -- 
> Ben Hutchings
> Software Developer, Codethink Ltd.

Thanks,
- Himanshu



Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation

2018-03-21 Thread Ben Hutchings
On Wed, 2018-03-21 at 17:55 +, Madhani, Himanshu wrote:
> Hi Ben, 
> 
> > On Mar 21, 2018, at 10:45 AM, Ben Hutchings  
> > wrote:
> > 
> > On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote:
> > > Hi Ben, 
> > > 
> > > > On Mar 20, 2018, at 2:36 PM, Ben Hutchings 
> > > >  wrote:
> > > > 
> > > > qla2x00_init_timer() calls add_timer() on the iocb timeout timer,
> > > > which means the timeout function pointer and any data that the
> > > > function depends on must be initialised beforehand.
> > > > 
> > > > Move this initialisation before each call to qla2x00_init_timer().  In
> > > > some cases qla2x00_init_timer() initialises a completion structure
> > > > needed by the timeout function, so move the call to add_timer() after
> > > > that.
> > 
> > [...]
> > > What motivated this patch? are you debugging any crash which was
> > > helped by moving code around? 
> > 
> > I saw the recent fix that added a call to del_timer(), noticed the lack
> > of synchronisation and then kept looking further.
> > 
> > > What I see from this patch is that its moving iocb.cmd.timeout field
> > > before qla2x00_init_timer(),
> > > which are completely different from each other. 
> > 
> > How are they "completely different"?  qla2x00_init_timer() starts a
> > timer that will call qla2x00_sp_timeout(), which in turn uses the
> > iocb_cmd.timeout function pointer.  We should not assume that any
> > initialisation code after the call to add_timer() will run before the
> > timer expires, since the scheduler might run other tasks for the whole
> > time.  It's unlikely but not impossible.
> > 
> 
> I see. I used “completely different” due to the lack of better wording. 
> 
> I wanted to get context and understand if there was any issue that would
> have helped with these changes. 
> 
> your explanation does help and I agree that there is window where timer() 
> will 
> pop before timeout is initialized. 
> 
> Let me put this patch in our test cycle for couple days and will respond on 
> this one

Thanks.

Ben.

-- 
Ben Hutchings
Software Developer, Codethink Ltd.



Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation

2018-03-21 Thread Madhani, Himanshu
Hi Ben, 

> On Mar 21, 2018, at 10:45 AM, Ben Hutchings  
> wrote:
> 
> On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote:
>> Hi Ben, 
>> 
>>> On Mar 20, 2018, at 2:36 PM, Ben Hutchings  
>>> wrote:
>>> 
>>> qla2x00_init_timer() calls add_timer() on the iocb timeout timer,
>>> which means the timeout function pointer and any data that the
>>> function depends on must be initialised beforehand.
>>> 
>>> Move this initialisation before each call to qla2x00_init_timer().  In
>>> some cases qla2x00_init_timer() initialises a completion structure
>>> needed by the timeout function, so move the call to add_timer() after
>>> that.
> [...]
>> What motivated this patch? are you debugging any crash which was
>> helped by moving code around? 
> 
> I saw the recent fix that added a call to del_timer(), noticed the lack
> of synchronisation and then kept looking further.
> 
>> What I see from this patch is that its moving iocb.cmd.timeout field
>> before qla2x00_init_timer(),
>> which are completely different from each other. 
> 
> How are they "completely different"?  qla2x00_init_timer() starts a
> timer that will call qla2x00_sp_timeout(), which in turn uses the
> iocb_cmd.timeout function pointer.  We should not assume that any
> initialisation code after the call to add_timer() will run before the
> timer expires, since the scheduler might run other tasks for the whole
> time.  It's unlikely but not impossible.
> 

I see. I used “completely different” due to the lack of better wording. 

I wanted to get context and understand if there was any issue that would
have helped with these changes. 

your explanation does help and I agree that there is window where timer() will 
pop before timeout is initialized. 

Let me put this patch in our test cycle for couple days and will respond on 
this one
 
> Ben.
> 
> -- 
> Ben Hutchings
> Software Developer, Codethink Ltd.
> 

Thanks,
- Himanshu



Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation

2018-03-21 Thread Ben Hutchings
On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote:
> Hi Ben, 
> 
> > On Mar 20, 2018, at 2:36 PM, Ben Hutchings  
> > wrote:
> > 
> > qla2x00_init_timer() calls add_timer() on the iocb timeout timer,
> > which means the timeout function pointer and any data that the
> > function depends on must be initialised beforehand.
> > 
> > Move this initialisation before each call to qla2x00_init_timer().  In
> > some cases qla2x00_init_timer() initialises a completion structure
> > needed by the timeout function, so move the call to add_timer() after
> > that.
[...]
> What motivated this patch? are you debugging any crash which was
> helped by moving code around? 

I saw the recent fix that added a call to del_timer(), noticed the lack
of synchronisation and then kept looking further.

> What I see from this patch is that its moving iocb.cmd.timeout field
> before qla2x00_init_timer(),
> which are completely different from each other. 

How are they "completely different"?  qla2x00_init_timer() starts a
timer that will call qla2x00_sp_timeout(), which in turn uses the
iocb_cmd.timeout function pointer.  We should not assume that any
initialisation code after the call to add_timer() will run before the
timer expires, since the scheduler might run other tasks for the whole
time.  It's unlikely but not impossible.

Ben.

-- 
Ben Hutchings
Software Developer, Codethink Ltd.



Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation

2018-03-21 Thread Madhani, Himanshu
Hi Ben, 

> On Mar 20, 2018, at 2:36 PM, Ben Hutchings  
> wrote:
> 
> qla2x00_init_timer() calls add_timer() on the iocb timeout timer,
> which means the timeout function pointer and any data that the
> function depends on must be initialised beforehand.
> 
> Move this initialisation before each call to qla2x00_init_timer().  In
> some cases qla2x00_init_timer() initialises a completion structure
> needed by the timeout function, so move the call to add_timer() after
> that.
> 
> Signed-off-by: Ben Hutchings 
> ---
> Changes from v1:
> 
> Rebased to remove textual dependency on a patch I haven't yet sent
> (oops).
> 
> Ben.
> 
> drivers/scsi/qla2xxx/qla_gs.c | 12 +++-
> drivers/scsi/qla2xxx/qla_init.c   | 37 +++--
> drivers/scsi/qla2xxx/qla_inline.h |  2 +-
> drivers/scsi/qla2xxx/qla_iocb.c   |  8 +---
> drivers/scsi/qla2xxx/qla_mbx.c|  8 
> drivers/scsi/qla2xxx/qla_mid.c|  2 +-
> drivers/scsi/qla2xxx/qla_mr.c |  5 +++--
> drivers/scsi/qla2xxx/qla_target.c |  2 +-
> 8 files changed, 45 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_gs.c b/drivers/scsi/qla2xxx/qla_gs.c
> index 403fa096f8c8..fcde5ea203c0 100644
> --- a/drivers/scsi/qla2xxx/qla_gs.c
> +++ b/drivers/scsi/qla2xxx/qla_gs.c
> @@ -3796,6 +3796,7 @@ int qla24xx_async_gffid(scsi_qla_host_t *vha, fc_port_t 
> *fcport)
>   sp->gen1 = fcport->rscn_gen;
>   sp->gen2 = fcport->login_gen;
> 
> + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2);
> 
>   /* CT_IU preamble  */
> @@ -3814,7 +3815,6 @@ int qla24xx_async_gffid(scsi_qla_host_t *vha, fc_port_t 
> *fcport)
>   sp->u.iocb_cmd.u.ctarg.rsp_size = GFF_ID_RSP_SIZE;
>   sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS;
> 
> - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   sp->done = qla24xx_async_gffid_sp_done;
> 
>   rval = qla2x00_start_sp(sp);
> @@ -4121,6 +4121,8 @@ static int qla24xx_async_gnnft(scsi_qla_host_t *vha, 
> struct srb *sp,
>   sp->name = "gnnft";
>   sp->gen1 = vha->hw->base_qpair->chip_reset;
>   sp->gen2 = fc4_type;
> +
> + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2);
> 
>   memset(sp->u.iocb_cmd.u.ctarg.rsp, 0, sp->u.iocb_cmd.u.ctarg.rsp_size);
> @@ -4137,7 +4139,6 @@ static int qla24xx_async_gnnft(scsi_qla_host_t *vha, 
> struct srb *sp,
>   sp->u.iocb_cmd.u.ctarg.req_size = GNN_FT_REQ_SIZE;
>   sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS;
> 
> - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   sp->done = qla2x00_async_gpnft_gnnft_sp_done;
> 
>   rval = qla2x00_start_sp(sp);
> @@ -4210,6 +4211,8 @@ int qla24xx_async_gpnft(scsi_qla_host_t *vha, u8 
> fc4_type)
>   sp->name = "gpnft";
>   sp->gen1 = vha->hw->base_qpair->chip_reset;
>   sp->gen2 = fc4_type;
> +
> + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2);
> 
>   sp->u.iocb_cmd.u.ctarg.req = dma_zalloc_coherent(>hw->pdev->dev,
> @@ -4248,7 +4251,6 @@ int qla24xx_async_gpnft(scsi_qla_host_t *vha, u8 
> fc4_type)
>   sp->u.iocb_cmd.u.ctarg.rsp_size = rspsz;
>   sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS;
> 
> - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   sp->done = qla2x00_async_gpnft_gnnft_sp_done;
> 
>   rval = qla2x00_start_sp(sp);
> @@ -4356,6 +4358,7 @@ int qla24xx_async_gnnid(scsi_qla_host_t *vha, fc_port_t 
> *fcport)
>   sp->gen1 = fcport->rscn_gen;
>   sp->gen2 = fcport->login_gen;
> 
> + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2);
> 
>   /* CT_IU preamble  */
> @@ -4377,7 +4380,6 @@ int qla24xx_async_gnnid(scsi_qla_host_t *vha, fc_port_t 
> *fcport)
>   sp->u.iocb_cmd.u.ctarg.rsp_size = GNN_ID_RSP_SIZE;
>   sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS;
> 
> - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   sp->done = qla2x00_async_gnnid_sp_done;
> 
>   rval = qla2x00_start_sp(sp);
> @@ -4493,6 +4495,7 @@ int qla24xx_async_gfpnid(scsi_qla_host_t *vha, 
> fc_port_t *fcport)
>   sp->gen1 = fcport->rscn_gen;
>   sp->gen2 = fcport->login_gen;
> 
> + sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   qla2x00_init_timer(sp, qla2x00_get_async_timeout(vha) + 2);
> 
>   /* CT_IU preamble  */
> @@ -4514,7 +4517,6 @@ int qla24xx_async_gfpnid(scsi_qla_host_t *vha, 
> fc_port_t *fcport)
>   sp->u.iocb_cmd.u.ctarg.rsp_size = GFPN_ID_RSP_SIZE;
>   sp->u.iocb_cmd.u.ctarg.nport_handle = NPH_SNS;
> 
> - sp->u.iocb_cmd.timeout = qla2x00_async_iocb_timeout;
>   sp->done = qla2x00_async_gfpnid_sp_done;
> 
>   rval = qla2x00_start_sp(sp);
> diff --git