Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-23 Thread Wei Hu (Xavier)


On 2018/5/23 11:47, Jason Gunthorpe wrote:
> On Wed, May 23, 2018 at 10:54:54AM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/23 4:26, Jason Gunthorpe wrote:
>>> On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
 On 2018/5/18 12:15, Jason Gunthorpe wrote:
> On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
>> On 2018/5/17 23:14, Jason Gunthorpe wrote:
>>> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
 diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
 b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
 index 86ef15f..e1c44a6 100644
 +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
 @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
 *hr_dev,
int ret = 0;
int ntc;
  
 +  if (hr_dev->is_reset)
 +  return 0;
 +
spin_lock_bh(>lock);
  
if (num > hns_roce_cmq_space(csq)) {
 @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
 hnae3_handle *handle)
return 0;
  
  error_failed_get_cfg:
 +  handle->priv = NULL;
kfree(hr_dev->priv);
  
  error_failed_kzalloc:
 @@ -4803,14 +4807,70 @@ static void 
 hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
  {
struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
 *)handle->priv;
  
 +  if (!hr_dev)
 +  return;
 +
hns_roce_exit(hr_dev);
 +  handle->priv = NULL;
kfree(hr_dev->priv);
ib_dealloc_device(_dev->ib_dev);
  }
>>> Why are these hunks here? If init fails then uninit should not be
>>> called, so why meddle with priv?
>> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
>> hr_dev,
>> We want clear the value in hns_roce_hw_v2_uninit_instance function.
>> So we can ensure no problem in RoCE driver.
> What problem could happen?
>
> I keep removing unnecessary sets to null and checks of null, so please
> don't add them if they cannot happen.
>
> Eg uninit should never be called with a null priv, that is a serious
> logic mis-design someplace if it happens.
>
> Jason
 NIC driver call the registered reset_notify() function to finish the
 part of RoCE reset process.
 In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
 we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
 resources.
 when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
 hns_roce_hw_v2_init_instance.
 if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
 the other callback
 function registered by RoCE driver.
>>> Don't design things like this.
>>>
>>> init/uninit are paired - do not call something uninit if it can be
>>> called after init fails, or better, arrange to prevent that so things
>>> are sane.
>>>
>>> Jason
>>>
>>> .
>> The current RoCE driver registered 3 callback function to NIC driver as
>> belows:
>> 1.init_instance/uninit_instance are paired.
>> 2.In reset_notify function,  RoCE dirver still call
>> init_instance/uninit_instance function.
>> but NIC driver does not perceive the behavior.  We need to judge in RoCE
>> driver.
Hi, Jason
I will send v2, thanks.
Regards
Wei Hu

> fix the nic driver
>
> Jason
>
> .
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-23 Thread Wei Hu (Xavier)


On 2018/5/23 11:47, Jason Gunthorpe wrote:
> On Wed, May 23, 2018 at 10:54:54AM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/23 4:26, Jason Gunthorpe wrote:
>>> On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
 On 2018/5/18 12:15, Jason Gunthorpe wrote:
> On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
>> On 2018/5/17 23:14, Jason Gunthorpe wrote:
>>> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
 diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
 b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
 index 86ef15f..e1c44a6 100644
 +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
 @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
 *hr_dev,
int ret = 0;
int ntc;
  
 +  if (hr_dev->is_reset)
 +  return 0;
 +
spin_lock_bh(>lock);
  
if (num > hns_roce_cmq_space(csq)) {
 @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
 hnae3_handle *handle)
return 0;
  
  error_failed_get_cfg:
 +  handle->priv = NULL;
kfree(hr_dev->priv);
  
  error_failed_kzalloc:
 @@ -4803,14 +4807,70 @@ static void 
 hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
  {
struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
 *)handle->priv;
  
 +  if (!hr_dev)
 +  return;
 +
hns_roce_exit(hr_dev);
 +  handle->priv = NULL;
kfree(hr_dev->priv);
ib_dealloc_device(_dev->ib_dev);
  }
>>> Why are these hunks here? If init fails then uninit should not be
>>> called, so why meddle with priv?
>> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
>> hr_dev,
>> We want clear the value in hns_roce_hw_v2_uninit_instance function.
>> So we can ensure no problem in RoCE driver.
> What problem could happen?
>
> I keep removing unnecessary sets to null and checks of null, so please
> don't add them if they cannot happen.
>
> Eg uninit should never be called with a null priv, that is a serious
> logic mis-design someplace if it happens.
>
> Jason
 NIC driver call the registered reset_notify() function to finish the
 part of RoCE reset process.
 In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
 we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
 resources.
 when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
 hns_roce_hw_v2_init_instance.
 if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
 the other callback
 function registered by RoCE driver.
>>> Don't design things like this.
>>>
>>> init/uninit are paired - do not call something uninit if it can be
>>> called after init fails, or better, arrange to prevent that so things
>>> are sane.
>>>
>>> Jason
>>>
>>> .
>> The current RoCE driver registered 3 callback function to NIC driver as
>> belows:
>> 1.init_instance/uninit_instance are paired.
>> 2.In reset_notify function,  RoCE dirver still call
>> init_instance/uninit_instance function.
>> but NIC driver does not perceive the behavior.  We need to judge in RoCE
>> driver.
Hi, Jason
I will send v2, thanks.
Regards
Wei Hu

> fix the nic driver
>
> Jason
>
> .
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-22 Thread Wei Hu (Xavier)


On 2018/5/23 10:54, Wei Hu (Xavier) wrote:
>
> On 2018/5/23 4:26, Jason Gunthorpe wrote:
>> On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
>>> On 2018/5/18 12:15, Jason Gunthorpe wrote:
 On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
> On 2018/5/17 23:14, Jason Gunthorpe wrote:
>> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
>>> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>> index 86ef15f..e1c44a6 100644
>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
>>> *hr_dev,
>>> int ret = 0;
>>> int ntc;
>>>  
>>> +   if (hr_dev->is_reset)
>>> +   return 0;
>>> +
>>> spin_lock_bh(>lock);
>>>  
>>> if (num > hns_roce_cmq_space(csq)) {
>>> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
>>> hnae3_handle *handle)
>>> return 0;
>>>  
>>>  error_failed_get_cfg:
>>> +   handle->priv = NULL;
>>> kfree(hr_dev->priv);
>>>  
>>>  error_failed_kzalloc:
>>> @@ -4803,14 +4807,70 @@ static void 
>>> hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>>  {
>>> struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
>>> *)handle->priv;
>>>  
>>> +   if (!hr_dev)
>>> +   return;
>>> +
>>> hns_roce_exit(hr_dev);
>>> +   handle->priv = NULL;
>>> kfree(hr_dev->priv);
>>> ib_dealloc_device(_dev->ib_dev);
>>>  }
>> Why are these hunks here? If init fails then uninit should not be
>> called, so why meddle with priv?
> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
> hr_dev,
> We want clear the value in hns_roce_hw_v2_uninit_instance function.
> So we can ensure no problem in RoCE driver.
 What problem could happen?

 I keep removing unnecessary sets to null and checks of null, so please
 don't add them if they cannot happen.

 Eg uninit should never be called with a null priv, that is a serious
 logic mis-design someplace if it happens.

 Jason
>>> NIC driver call the registered reset_notify() function to finish the
>>> part of RoCE reset process.
>>> In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
>>> we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
>>> resources.
>>> when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
>>> hns_roce_hw_v2_init_instance.
>>> if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
>>> the other callback
>>> function registered by RoCE driver.
>> Don't design things like this.
>>
>> init/uninit are paired - do not call something uninit if it can be
>> called after init fails, or better, arrange to prevent that so things
>> are sane.
>>
>> Jason
>>
>> .
> The current RoCE driver registered 3 callback function to NIC driver as
> belows:
> 1.init_instance/uninit_instance are paired.
> 2.In reset_notify function,  RoCE dirver still call
> init_instance/uninit_instance function.
> but NIC driver does not perceive the behavior.  We need to judge in RoCE
> driver.
>
> static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
> .init_instance = hns_roce_hw_v2_init_instance,
> .uninit_instance = hns_roce_hw_v2_uninit_instance,
> .reset_notify = hns_roce_hw_v2_reset_notify,
> };
struct hnae3_handle is defined in NIC driver, and handle->priv is used
for RoCE driver,
NIC driver will not use this member handle->priv.

struct hnae3_handle {
struct hnae3_client *client;
struct pci_dev *pdev;
void *priv;
struct hnae3_ae_algo *ae_algo;  /* the class who provides this handle */
u64 flags; /* Indicate the capabilities for this handle*/

unsigned long last_reset_time;
enum hnae3_reset_type reset_level;

union {
struct net_device *netdev; /* first member */
struct hnae3_knic_private_info kinfo;
struct hnae3_unic_private_info uinfo;
struct hnae3_roce_private_info rinfo;
};

u32 numa_node_mask;/* for multi-chip support */
};
> Wei Hu
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> .
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-22 Thread Wei Hu (Xavier)


On 2018/5/23 10:54, Wei Hu (Xavier) wrote:
>
> On 2018/5/23 4:26, Jason Gunthorpe wrote:
>> On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
>>> On 2018/5/18 12:15, Jason Gunthorpe wrote:
 On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
> On 2018/5/17 23:14, Jason Gunthorpe wrote:
>> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
>>> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>> index 86ef15f..e1c44a6 100644
>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
>>> *hr_dev,
>>> int ret = 0;
>>> int ntc;
>>>  
>>> +   if (hr_dev->is_reset)
>>> +   return 0;
>>> +
>>> spin_lock_bh(>lock);
>>>  
>>> if (num > hns_roce_cmq_space(csq)) {
>>> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
>>> hnae3_handle *handle)
>>> return 0;
>>>  
>>>  error_failed_get_cfg:
>>> +   handle->priv = NULL;
>>> kfree(hr_dev->priv);
>>>  
>>>  error_failed_kzalloc:
>>> @@ -4803,14 +4807,70 @@ static void 
>>> hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>>  {
>>> struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
>>> *)handle->priv;
>>>  
>>> +   if (!hr_dev)
>>> +   return;
>>> +
>>> hns_roce_exit(hr_dev);
>>> +   handle->priv = NULL;
>>> kfree(hr_dev->priv);
>>> ib_dealloc_device(_dev->ib_dev);
>>>  }
>> Why are these hunks here? If init fails then uninit should not be
>> called, so why meddle with priv?
> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
> hr_dev,
> We want clear the value in hns_roce_hw_v2_uninit_instance function.
> So we can ensure no problem in RoCE driver.
 What problem could happen?

 I keep removing unnecessary sets to null and checks of null, so please
 don't add them if they cannot happen.

 Eg uninit should never be called with a null priv, that is a serious
 logic mis-design someplace if it happens.

 Jason
>>> NIC driver call the registered reset_notify() function to finish the
>>> part of RoCE reset process.
>>> In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
>>> we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
>>> resources.
>>> when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
>>> hns_roce_hw_v2_init_instance.
>>> if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
>>> the other callback
>>> function registered by RoCE driver.
>> Don't design things like this.
>>
>> init/uninit are paired - do not call something uninit if it can be
>> called after init fails, or better, arrange to prevent that so things
>> are sane.
>>
>> Jason
>>
>> .
> The current RoCE driver registered 3 callback function to NIC driver as
> belows:
> 1.init_instance/uninit_instance are paired.
> 2.In reset_notify function,  RoCE dirver still call
> init_instance/uninit_instance function.
> but NIC driver does not perceive the behavior.  We need to judge in RoCE
> driver.
>
> static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
> .init_instance = hns_roce_hw_v2_init_instance,
> .uninit_instance = hns_roce_hw_v2_uninit_instance,
> .reset_notify = hns_roce_hw_v2_reset_notify,
> };
struct hnae3_handle is defined in NIC driver, and handle->priv is used
for RoCE driver,
NIC driver will not use this member handle->priv.

struct hnae3_handle {
struct hnae3_client *client;
struct pci_dev *pdev;
void *priv;
struct hnae3_ae_algo *ae_algo;  /* the class who provides this handle */
u64 flags; /* Indicate the capabilities for this handle*/

unsigned long last_reset_time;
enum hnae3_reset_type reset_level;

union {
struct net_device *netdev; /* first member */
struct hnae3_knic_private_info kinfo;
struct hnae3_unic_private_info uinfo;
struct hnae3_roce_private_info rinfo;
};

u32 numa_node_mask;/* for multi-chip support */
};
> Wei Hu
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> .
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-22 Thread Jason Gunthorpe
On Wed, May 23, 2018 at 10:54:54AM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/23 4:26, Jason Gunthorpe wrote:
> > On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
> >>
> >> On 2018/5/18 12:15, Jason Gunthorpe wrote:
> >>> On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
>  On 2018/5/17 23:14, Jason Gunthorpe wrote:
> > On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
> >> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> index 86ef15f..e1c44a6 100644
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
> >> *hr_dev,
> >>int ret = 0;
> >>int ntc;
> >>  
> >> +  if (hr_dev->is_reset)
> >> +  return 0;
> >> +
> >>spin_lock_bh(>lock);
> >>  
> >>if (num > hns_roce_cmq_space(csq)) {
> >> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
> >> hnae3_handle *handle)
> >>return 0;
> >>  
> >>  error_failed_get_cfg:
> >> +  handle->priv = NULL;
> >>kfree(hr_dev->priv);
> >>  
> >>  error_failed_kzalloc:
> >> @@ -4803,14 +4807,70 @@ static void 
> >> hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
> >>  {
> >>struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
> >> *)handle->priv;
> >>  
> >> +  if (!hr_dev)
> >> +  return;
> >> +
> >>hns_roce_exit(hr_dev);
> >> +  handle->priv = NULL;
> >>kfree(hr_dev->priv);
> >>ib_dealloc_device(_dev->ib_dev);
> >>  }
> > Why are these hunks here? If init fails then uninit should not be
> > called, so why meddle with priv?
>  In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
>  hr_dev,
>  We want clear the value in hns_roce_hw_v2_uninit_instance function.
>  So we can ensure no problem in RoCE driver.
> >>> What problem could happen?
> >>>
> >>> I keep removing unnecessary sets to null and checks of null, so please
> >>> don't add them if they cannot happen.
> >>>
> >>> Eg uninit should never be called with a null priv, that is a serious
> >>> logic mis-design someplace if it happens.
> >>>
> >>> Jason
> >> NIC driver call the registered reset_notify() function to finish the
> >> part of RoCE reset process.
> >> In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
> >> we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
> >> resources.
> >> when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
> >> hns_roce_hw_v2_init_instance.
> >> if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
> >> the other callback
> >> function registered by RoCE driver.
> > Don't design things like this.
> >
> > init/uninit are paired - do not call something uninit if it can be
> > called after init fails, or better, arrange to prevent that so things
> > are sane.
> >
> > Jason
> >
> > .
> The current RoCE driver registered 3 callback function to NIC driver as
> belows:
> 1.init_instance/uninit_instance are paired.
> 2.In reset_notify function,  RoCE dirver still call
> init_instance/uninit_instance function.
> but NIC driver does not perceive the behavior.  We need to judge in RoCE
> driver.

fix the nic driver

Jason


Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-22 Thread Jason Gunthorpe
On Wed, May 23, 2018 at 10:54:54AM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/23 4:26, Jason Gunthorpe wrote:
> > On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
> >>
> >> On 2018/5/18 12:15, Jason Gunthorpe wrote:
> >>> On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
>  On 2018/5/17 23:14, Jason Gunthorpe wrote:
> > On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
> >> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> index 86ef15f..e1c44a6 100644
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
> >> *hr_dev,
> >>int ret = 0;
> >>int ntc;
> >>  
> >> +  if (hr_dev->is_reset)
> >> +  return 0;
> >> +
> >>spin_lock_bh(>lock);
> >>  
> >>if (num > hns_roce_cmq_space(csq)) {
> >> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
> >> hnae3_handle *handle)
> >>return 0;
> >>  
> >>  error_failed_get_cfg:
> >> +  handle->priv = NULL;
> >>kfree(hr_dev->priv);
> >>  
> >>  error_failed_kzalloc:
> >> @@ -4803,14 +4807,70 @@ static void 
> >> hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
> >>  {
> >>struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
> >> *)handle->priv;
> >>  
> >> +  if (!hr_dev)
> >> +  return;
> >> +
> >>hns_roce_exit(hr_dev);
> >> +  handle->priv = NULL;
> >>kfree(hr_dev->priv);
> >>ib_dealloc_device(_dev->ib_dev);
> >>  }
> > Why are these hunks here? If init fails then uninit should not be
> > called, so why meddle with priv?
>  In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
>  hr_dev,
>  We want clear the value in hns_roce_hw_v2_uninit_instance function.
>  So we can ensure no problem in RoCE driver.
> >>> What problem could happen?
> >>>
> >>> I keep removing unnecessary sets to null and checks of null, so please
> >>> don't add them if they cannot happen.
> >>>
> >>> Eg uninit should never be called with a null priv, that is a serious
> >>> logic mis-design someplace if it happens.
> >>>
> >>> Jason
> >> NIC driver call the registered reset_notify() function to finish the
> >> part of RoCE reset process.
> >> In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
> >> we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
> >> resources.
> >> when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
> >> hns_roce_hw_v2_init_instance.
> >> if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
> >> the other callback
> >> function registered by RoCE driver.
> > Don't design things like this.
> >
> > init/uninit are paired - do not call something uninit if it can be
> > called after init fails, or better, arrange to prevent that so things
> > are sane.
> >
> > Jason
> >
> > .
> The current RoCE driver registered 3 callback function to NIC driver as
> belows:
> 1.init_instance/uninit_instance are paired.
> 2.In reset_notify function,  RoCE dirver still call
> init_instance/uninit_instance function.
> but NIC driver does not perceive the behavior.  We need to judge in RoCE
> driver.

fix the nic driver

Jason


Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-22 Thread Wei Hu (Xavier)


On 2018/5/23 4:26, Jason Gunthorpe wrote:
> On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/18 12:15, Jason Gunthorpe wrote:
>>> On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
 On 2018/5/17 23:14, Jason Gunthorpe wrote:
> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
>> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index 86ef15f..e1c44a6 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
>> *hr_dev,
>>  int ret = 0;
>>  int ntc;
>>  
>> +if (hr_dev->is_reset)
>> +return 0;
>> +
>>  spin_lock_bh(>lock);
>>  
>>  if (num > hns_roce_cmq_space(csq)) {
>> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
>> hnae3_handle *handle)
>>  return 0;
>>  
>>  error_failed_get_cfg:
>> +handle->priv = NULL;
>>  kfree(hr_dev->priv);
>>  
>>  error_failed_kzalloc:
>> @@ -4803,14 +4807,70 @@ static void 
>> hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>  {
>>  struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
>> *)handle->priv;
>>  
>> +if (!hr_dev)
>> +return;
>> +
>>  hns_roce_exit(hr_dev);
>> +handle->priv = NULL;
>>  kfree(hr_dev->priv);
>>  ib_dealloc_device(_dev->ib_dev);
>>  }
> Why are these hunks here? If init fails then uninit should not be
> called, so why meddle with priv?
 In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
 hr_dev,
 We want clear the value in hns_roce_hw_v2_uninit_instance function.
 So we can ensure no problem in RoCE driver.
>>> What problem could happen?
>>>
>>> I keep removing unnecessary sets to null and checks of null, so please
>>> don't add them if they cannot happen.
>>>
>>> Eg uninit should never be called with a null priv, that is a serious
>>> logic mis-design someplace if it happens.
>>>
>>> Jason
>> NIC driver call the registered reset_notify() function to finish the
>> part of RoCE reset process.
>> In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
>> we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
>> resources.
>> when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
>> hns_roce_hw_v2_init_instance.
>> if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
>> the other callback
>> function registered by RoCE driver.
> Don't design things like this.
>
> init/uninit are paired - do not call something uninit if it can be
> called after init fails, or better, arrange to prevent that so things
> are sane.
>
> Jason
>
> .
The current RoCE driver registered 3 callback function to NIC driver as
belows:
1.init_instance/uninit_instance are paired.
2.In reset_notify function,  RoCE dirver still call
init_instance/uninit_instance function.
but NIC driver does not perceive the behavior.  We need to judge in RoCE
driver.

static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
.init_instance = hns_roce_hw_v2_init_instance,
.uninit_instance = hns_roce_hw_v2_uninit_instance,
.reset_notify = hns_roce_hw_v2_reset_notify,
};

Wei Hu
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-22 Thread Wei Hu (Xavier)


On 2018/5/23 4:26, Jason Gunthorpe wrote:
> On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/18 12:15, Jason Gunthorpe wrote:
>>> On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
 On 2018/5/17 23:14, Jason Gunthorpe wrote:
> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
>> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index 86ef15f..e1c44a6 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
>> *hr_dev,
>>  int ret = 0;
>>  int ntc;
>>  
>> +if (hr_dev->is_reset)
>> +return 0;
>> +
>>  spin_lock_bh(>lock);
>>  
>>  if (num > hns_roce_cmq_space(csq)) {
>> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
>> hnae3_handle *handle)
>>  return 0;
>>  
>>  error_failed_get_cfg:
>> +handle->priv = NULL;
>>  kfree(hr_dev->priv);
>>  
>>  error_failed_kzalloc:
>> @@ -4803,14 +4807,70 @@ static void 
>> hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>  {
>>  struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
>> *)handle->priv;
>>  
>> +if (!hr_dev)
>> +return;
>> +
>>  hns_roce_exit(hr_dev);
>> +handle->priv = NULL;
>>  kfree(hr_dev->priv);
>>  ib_dealloc_device(_dev->ib_dev);
>>  }
> Why are these hunks here? If init fails then uninit should not be
> called, so why meddle with priv?
 In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
 hr_dev,
 We want clear the value in hns_roce_hw_v2_uninit_instance function.
 So we can ensure no problem in RoCE driver.
>>> What problem could happen?
>>>
>>> I keep removing unnecessary sets to null and checks of null, so please
>>> don't add them if they cannot happen.
>>>
>>> Eg uninit should never be called with a null priv, that is a serious
>>> logic mis-design someplace if it happens.
>>>
>>> Jason
>> NIC driver call the registered reset_notify() function to finish the
>> part of RoCE reset process.
>> In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
>> we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
>> resources.
>> when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
>> hns_roce_hw_v2_init_instance.
>> if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
>> the other callback
>> function registered by RoCE driver.
> Don't design things like this.
>
> init/uninit are paired - do not call something uninit if it can be
> called after init fails, or better, arrange to prevent that so things
> are sane.
>
> Jason
>
> .
The current RoCE driver registered 3 callback function to NIC driver as
belows:
1.init_instance/uninit_instance are paired.
2.In reset_notify function,  RoCE dirver still call
init_instance/uninit_instance function.
but NIC driver does not perceive the behavior.  We need to judge in RoCE
driver.

static const struct hnae3_client_ops hns_roce_hw_v2_ops = {
.init_instance = hns_roce_hw_v2_init_instance,
.uninit_instance = hns_roce_hw_v2_uninit_instance,
.reset_notify = hns_roce_hw_v2_reset_notify,
};

Wei Hu
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-22 Thread Jason Gunthorpe
On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/18 12:15, Jason Gunthorpe wrote:
> > On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
> >>
> >> On 2018/5/17 23:14, Jason Gunthorpe wrote:
> >>> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
>  diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
>  b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>  index 86ef15f..e1c44a6 100644
>  +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>  @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
>  *hr_dev,
>   int ret = 0;
>   int ntc;
>   
>  +if (hr_dev->is_reset)
>  +return 0;
>  +
>   spin_lock_bh(>lock);
>   
>   if (num > hns_roce_cmq_space(csq)) {
>  @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
>  hnae3_handle *handle)
>   return 0;
>   
>   error_failed_get_cfg:
>  +handle->priv = NULL;
>   kfree(hr_dev->priv);
>   
>   error_failed_kzalloc:
>  @@ -4803,14 +4807,70 @@ static void 
>  hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>   {
>   struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
>  *)handle->priv;
>   
>  +if (!hr_dev)
>  +return;
>  +
>   hns_roce_exit(hr_dev);
>  +handle->priv = NULL;
>   kfree(hr_dev->priv);
>   ib_dealloc_device(_dev->ib_dev);
>   }
> >>> Why are these hunks here? If init fails then uninit should not be
> >>> called, so why meddle with priv?
> >> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
> >> hr_dev,
> >> We want clear the value in hns_roce_hw_v2_uninit_instance function.
> >> So we can ensure no problem in RoCE driver.
> > What problem could happen?
> >
> > I keep removing unnecessary sets to null and checks of null, so please
> > don't add them if they cannot happen.
> >
> > Eg uninit should never be called with a null priv, that is a serious
> > logic mis-design someplace if it happens.
> >
> > Jason
> NIC driver call the registered reset_notify() function to finish the
> part of RoCE reset process.
> In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
> we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
> resources.
> when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
> hns_roce_hw_v2_init_instance.
> if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
> the other callback
> function registered by RoCE driver.

Don't design things like this.

init/uninit are paired - do not call something uninit if it can be
called after init fails, or better, arrange to prevent that so things
are sane.

Jason


Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-22 Thread Jason Gunthorpe
On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/18 12:15, Jason Gunthorpe wrote:
> > On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
> >>
> >> On 2018/5/17 23:14, Jason Gunthorpe wrote:
> >>> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
>  diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
>  b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>  index 86ef15f..e1c44a6 100644
>  +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>  @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
>  *hr_dev,
>   int ret = 0;
>   int ntc;
>   
>  +if (hr_dev->is_reset)
>  +return 0;
>  +
>   spin_lock_bh(>lock);
>   
>   if (num > hns_roce_cmq_space(csq)) {
>  @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
>  hnae3_handle *handle)
>   return 0;
>   
>   error_failed_get_cfg:
>  +handle->priv = NULL;
>   kfree(hr_dev->priv);
>   
>   error_failed_kzalloc:
>  @@ -4803,14 +4807,70 @@ static void 
>  hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>   {
>   struct hns_roce_dev *hr_dev = (struct hns_roce_dev 
>  *)handle->priv;
>   
>  +if (!hr_dev)
>  +return;
>  +
>   hns_roce_exit(hr_dev);
>  +handle->priv = NULL;
>   kfree(hr_dev->priv);
>   ib_dealloc_device(_dev->ib_dev);
>   }
> >>> Why are these hunks here? If init fails then uninit should not be
> >>> called, so why meddle with priv?
> >> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
> >> hr_dev,
> >> We want clear the value in hns_roce_hw_v2_uninit_instance function.
> >> So we can ensure no problem in RoCE driver.
> > What problem could happen?
> >
> > I keep removing unnecessary sets to null and checks of null, so please
> > don't add them if they cannot happen.
> >
> > Eg uninit should never be called with a null priv, that is a serious
> > logic mis-design someplace if it happens.
> >
> > Jason
> NIC driver call the registered reset_notify() function to finish the
> part of RoCE reset process.
> In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
> we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
> resources.
> when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
> hns_roce_hw_v2_init_instance.
> if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
> the other callback
> function registered by RoCE driver.

Don't design things like this.

init/uninit are paired - do not call something uninit if it can be
called after init fails, or better, arrange to prevent that so things
are sane.

Jason


Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-18 Thread Wei Hu (Xavier)


On 2018/5/18 12:15, Jason Gunthorpe wrote:
> On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/17 23:14, Jason Gunthorpe wrote:
>>> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
 diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
 b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
 index 86ef15f..e1c44a6 100644
 +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
 @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
 *hr_dev,
int ret = 0;
int ntc;
  
 +  if (hr_dev->is_reset)
 +  return 0;
 +
spin_lock_bh(>lock);
  
if (num > hns_roce_cmq_space(csq)) {
 @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
 hnae3_handle *handle)
return 0;
  
  error_failed_get_cfg:
 +  handle->priv = NULL;
kfree(hr_dev->priv);
  
  error_failed_kzalloc:
 @@ -4803,14 +4807,70 @@ static void hns_roce_hw_v2_uninit_instance(struct 
 hnae3_handle *handle,
  {
struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
  
 +  if (!hr_dev)
 +  return;
 +
hns_roce_exit(hr_dev);
 +  handle->priv = NULL;
kfree(hr_dev->priv);
ib_dealloc_device(_dev->ib_dev);
  }
>>> Why are these hunks here? If init fails then uninit should not be
>>> called, so why meddle with priv?
>> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
>> hr_dev,
>> We want clear the value in hns_roce_hw_v2_uninit_instance function.
>> So we can ensure no problem in RoCE driver.
> What problem could happen?
>
> I keep removing unnecessary sets to null and checks of null, so please
> don't add them if they cannot happen.
>
> Eg uninit should never be called with a null priv, that is a serious
> logic mis-design someplace if it happens.
>
> Jason
NIC driver call the registered reset_notify() function to finish the
part of RoCE reset process.
In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
resources.
when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
hns_roce_hw_v2_init_instance.
if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
the other callback
function registered by RoCE driver.
 
The related RoCE driver:
static int hns_roce_hw_v2_reset_notify_uninit(struct hnae3_handle *handle)
{
msleep(100);
hns_roce_hw_v2_uninit_instance(handle, false);
return 0;
}

static int hns_roce_hw_v2_reset_notify(struct hnae3_handle *handle,
   enum hnae3_reset_notify_type type)
{
int ret = 0;

switch (type) {
case HNAE3_DOWN_CLIENT:
ret = hns_roce_hw_v2_reset_notify_down(handle);
break;
case HNAE3_INIT_CLIENT:
ret = hns_roce_hw_v2_init_instance(handle);
break;
case HNAE3_UNINIT_CLIENT:
ret = hns_roce_hw_v2_reset_notify_uninit(handle);
break;
default:
break;
}

return ret;
}

The related NIC driver:

static int hclge_notify_roce_client(struct hclge_dev *hdev,
enum hnae3_reset_notify_type type)
{
struct hnae3_client *client = hdev->roce_client;
struct hnae3_handle *handle;
int ret = 0;
u16 i;

if (!client)
return 0;

if (!client->ops->reset_notify)
return -EOPNOTSUPP;

for (i = 0; i < hdev->num_vmdq_vport + 1; i++) {
handle = >vport[i].roce;
ret = client->ops->reset_notify(handle, type);
if (ret) {
dev_err(>pdev->dev,
"notify roce client failed %d", ret);
return ret;
}
}

return ret;
}

static void hclge_reset(struct hclge_dev *hdev)
{
struct hnae3_handle *handle;

/* perform reset of the stack & ae device for a client */
handle = >vport[0].nic;

hclge_notify_roce_client(hdev, HNAE3_DOWN_CLIENT);
hclge_notify_roce_client(hdev, HNAE3_UNINIT_CLIENT);

rtnl_lock();
hclge_notify_client(hdev, HNAE3_DOWN_CLIENT);

if (!hclge_reset_wait(hdev)) {
hclge_notify_client(hdev, HNAE3_UNINIT_CLIENT);
hclge_reset_ae_dev(hdev->ae_dev);
hclge_notify_client(hdev, HNAE3_INIT_CLIENT);

hclge_clear_reset_cause(hdev);
} else {
/* schedule again to check pending resets later */
set_bit(hdev->reset_type, >reset_pending);
hclge_reset_task_schedule(hdev);
}

hclge_notify_client(hdev, HNAE3_UP_CLIENT);
handle->last_reset_time = jiffies;
rtnl_unlock();

hclge_notify_roce_client(hdev, HNAE3_INIT_CLIENT);
hclge_notify_roce_client(hdev, HNAE3_UP_CLIENT);
}

Thanks, Jason

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-18 Thread Wei Hu (Xavier)


On 2018/5/18 12:15, Jason Gunthorpe wrote:
> On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/17 23:14, Jason Gunthorpe wrote:
>>> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
 diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
 b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
 index 86ef15f..e1c44a6 100644
 +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
 @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
 *hr_dev,
int ret = 0;
int ntc;
  
 +  if (hr_dev->is_reset)
 +  return 0;
 +
spin_lock_bh(>lock);
  
if (num > hns_roce_cmq_space(csq)) {
 @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
 hnae3_handle *handle)
return 0;
  
  error_failed_get_cfg:
 +  handle->priv = NULL;
kfree(hr_dev->priv);
  
  error_failed_kzalloc:
 @@ -4803,14 +4807,70 @@ static void hns_roce_hw_v2_uninit_instance(struct 
 hnae3_handle *handle,
  {
struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
  
 +  if (!hr_dev)
 +  return;
 +
hns_roce_exit(hr_dev);
 +  handle->priv = NULL;
kfree(hr_dev->priv);
ib_dealloc_device(_dev->ib_dev);
  }
>>> Why are these hunks here? If init fails then uninit should not be
>>> called, so why meddle with priv?
>> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
>> hr_dev,
>> We want clear the value in hns_roce_hw_v2_uninit_instance function.
>> So we can ensure no problem in RoCE driver.
> What problem could happen?
>
> I keep removing unnecessary sets to null and checks of null, so please
> don't add them if they cannot happen.
>
> Eg uninit should never be called with a null priv, that is a serious
> logic mis-design someplace if it happens.
>
> Jason
NIC driver call the registered reset_notify() function to finish the
part of RoCE reset process.
In RoCE driver,  when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
resources.
when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
hns_roce_hw_v2_init_instance.
if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
the other callback
function registered by RoCE driver.
 
The related RoCE driver:
static int hns_roce_hw_v2_reset_notify_uninit(struct hnae3_handle *handle)
{
msleep(100);
hns_roce_hw_v2_uninit_instance(handle, false);
return 0;
}

static int hns_roce_hw_v2_reset_notify(struct hnae3_handle *handle,
   enum hnae3_reset_notify_type type)
{
int ret = 0;

switch (type) {
case HNAE3_DOWN_CLIENT:
ret = hns_roce_hw_v2_reset_notify_down(handle);
break;
case HNAE3_INIT_CLIENT:
ret = hns_roce_hw_v2_init_instance(handle);
break;
case HNAE3_UNINIT_CLIENT:
ret = hns_roce_hw_v2_reset_notify_uninit(handle);
break;
default:
break;
}

return ret;
}

The related NIC driver:

static int hclge_notify_roce_client(struct hclge_dev *hdev,
enum hnae3_reset_notify_type type)
{
struct hnae3_client *client = hdev->roce_client;
struct hnae3_handle *handle;
int ret = 0;
u16 i;

if (!client)
return 0;

if (!client->ops->reset_notify)
return -EOPNOTSUPP;

for (i = 0; i < hdev->num_vmdq_vport + 1; i++) {
handle = >vport[i].roce;
ret = client->ops->reset_notify(handle, type);
if (ret) {
dev_err(>pdev->dev,
"notify roce client failed %d", ret);
return ret;
}
}

return ret;
}

static void hclge_reset(struct hclge_dev *hdev)
{
struct hnae3_handle *handle;

/* perform reset of the stack & ae device for a client */
handle = >vport[0].nic;

hclge_notify_roce_client(hdev, HNAE3_DOWN_CLIENT);
hclge_notify_roce_client(hdev, HNAE3_UNINIT_CLIENT);

rtnl_lock();
hclge_notify_client(hdev, HNAE3_DOWN_CLIENT);

if (!hclge_reset_wait(hdev)) {
hclge_notify_client(hdev, HNAE3_UNINIT_CLIENT);
hclge_reset_ae_dev(hdev->ae_dev);
hclge_notify_client(hdev, HNAE3_INIT_CLIENT);

hclge_clear_reset_cause(hdev);
} else {
/* schedule again to check pending resets later */
set_bit(hdev->reset_type, >reset_pending);
hclge_reset_task_schedule(hdev);
}

hclge_notify_client(hdev, HNAE3_UP_CLIENT);
handle->last_reset_time = jiffies;
rtnl_unlock();

hclge_notify_roce_client(hdev, HNAE3_INIT_CLIENT);
hclge_notify_roce_client(hdev, HNAE3_UP_CLIENT);
}

Thanks, Jason

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-17 Thread Jason Gunthorpe
On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/17 23:14, Jason Gunthorpe wrote:
> > On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
> >> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> index 86ef15f..e1c44a6 100644
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
> >> *hr_dev,
> >>int ret = 0;
> >>int ntc;
> >>  
> >> +  if (hr_dev->is_reset)
> >> +  return 0;
> >> +
> >>spin_lock_bh(>lock);
> >>  
> >>if (num > hns_roce_cmq_space(csq)) {
> >> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
> >> hnae3_handle *handle)
> >>return 0;
> >>  
> >>  error_failed_get_cfg:
> >> +  handle->priv = NULL;
> >>kfree(hr_dev->priv);
> >>  
> >>  error_failed_kzalloc:
> >> @@ -4803,14 +4807,70 @@ static void hns_roce_hw_v2_uninit_instance(struct 
> >> hnae3_handle *handle,
> >>  {
> >>struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
> >>  
> >> +  if (!hr_dev)
> >> +  return;
> >> +
> >>hns_roce_exit(hr_dev);
> >> +  handle->priv = NULL;
> >>kfree(hr_dev->priv);
> >>ib_dealloc_device(_dev->ib_dev);
> >>  }
> > Why are these hunks here? If init fails then uninit should not be
> > called, so why meddle with priv?
> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
> hr_dev,
> We want clear the value in hns_roce_hw_v2_uninit_instance function.
> So we can ensure no problem in RoCE driver.

What problem could happen?

I keep removing unnecessary sets to null and checks of null, so please
don't add them if they cannot happen.

Eg uninit should never be called with a null priv, that is a serious
logic mis-design someplace if it happens.

Jason


Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-17 Thread Jason Gunthorpe
On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2018/5/17 23:14, Jason Gunthorpe wrote:
> > On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
> >> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
> >> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> index 86ef15f..e1c44a6 100644
> >> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> >> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev 
> >> *hr_dev,
> >>int ret = 0;
> >>int ntc;
> >>  
> >> +  if (hr_dev->is_reset)
> >> +  return 0;
> >> +
> >>spin_lock_bh(>lock);
> >>  
> >>if (num > hns_roce_cmq_space(csq)) {
> >> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
> >> hnae3_handle *handle)
> >>return 0;
> >>  
> >>  error_failed_get_cfg:
> >> +  handle->priv = NULL;
> >>kfree(hr_dev->priv);
> >>  
> >>  error_failed_kzalloc:
> >> @@ -4803,14 +4807,70 @@ static void hns_roce_hw_v2_uninit_instance(struct 
> >> hnae3_handle *handle,
> >>  {
> >>struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
> >>  
> >> +  if (!hr_dev)
> >> +  return;
> >> +
> >>hns_roce_exit(hr_dev);
> >> +  handle->priv = NULL;
> >>kfree(hr_dev->priv);
> >>ib_dealloc_device(_dev->ib_dev);
> >>  }
> > Why are these hunks here? If init fails then uninit should not be
> > called, so why meddle with priv?
> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
> hr_dev,
> We want clear the value in hns_roce_hw_v2_uninit_instance function.
> So we can ensure no problem in RoCE driver.

What problem could happen?

I keep removing unnecessary sets to null and checks of null, so please
don't add them if they cannot happen.

Eg uninit should never be called with a null priv, that is a serious
logic mis-design someplace if it happens.

Jason


Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-17 Thread Wei Hu (Xavier)


On 2018/5/17 23:14, Jason Gunthorpe wrote:
> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
>> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index 86ef15f..e1c44a6 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
>>  int ret = 0;
>>  int ntc;
>>  
>> +if (hr_dev->is_reset)
>> +return 0;
>> +
>>  spin_lock_bh(>lock);
>>  
>>  if (num > hns_roce_cmq_space(csq)) {
>> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
>> hnae3_handle *handle)
>>  return 0;
>>  
>>  error_failed_get_cfg:
>> +handle->priv = NULL;
>>  kfree(hr_dev->priv);
>>  
>>  error_failed_kzalloc:
>> @@ -4803,14 +4807,70 @@ static void hns_roce_hw_v2_uninit_instance(struct 
>> hnae3_handle *handle,
>>  {
>>  struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>>  
>> +if (!hr_dev)
>> +return;
>> +
>>  hns_roce_exit(hr_dev);
>> +handle->priv = NULL;
>>  kfree(hr_dev->priv);
>>  ib_dealloc_device(_dev->ib_dev);
>>  }
> Why are these hunks here? If init fails then uninit should not be
> called, so why meddle with priv?
In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
hr_dev,
We want clear the value in hns_roce_hw_v2_uninit_instance function.
So we can ensure no problem in RoCE driver.


static int hns_roce_hw_v2_init_instance(struct hnae3_handle *handle)
{
struct hns_roce_dev *hr_dev;
int ret;

hr_dev = (struct hns_roce_dev *)ib_alloc_device(sizeof(*hr_dev));
if (!hr_dev)
return -ENOMEM;

   ...// other code

handle->priv = hr_dev;

// other code

return 0;

error_xxx:
handle->priv = NULL;
...// other code

error_:
ib_dealloc_device(_dev->ib_dev);

return ret;
}

static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
   bool reset)
{
struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;

if (!hr_dev)
return;

hns_roce_exit(hr_dev);
handle->priv = NULL;
kfree(hr_dev->priv);
ib_dealloc_device(_dev->ib_dev);
}

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-17 Thread Wei Hu (Xavier)


On 2018/5/17 23:14, Jason Gunthorpe wrote:
> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
>> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> index 86ef15f..e1c44a6 100644
>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
>>  int ret = 0;
>>  int ntc;
>>  
>> +if (hr_dev->is_reset)
>> +return 0;
>> +
>>  spin_lock_bh(>lock);
>>  
>>  if (num > hns_roce_cmq_space(csq)) {
>> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
>> hnae3_handle *handle)
>>  return 0;
>>  
>>  error_failed_get_cfg:
>> +handle->priv = NULL;
>>  kfree(hr_dev->priv);
>>  
>>  error_failed_kzalloc:
>> @@ -4803,14 +4807,70 @@ static void hns_roce_hw_v2_uninit_instance(struct 
>> hnae3_handle *handle,
>>  {
>>  struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>>  
>> +if (!hr_dev)
>> +return;
>> +
>>  hns_roce_exit(hr_dev);
>> +handle->priv = NULL;
>>  kfree(hr_dev->priv);
>>  ib_dealloc_device(_dev->ib_dev);
>>  }
> Why are these hunks here? If init fails then uninit should not be
> called, so why meddle with priv?
In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with 
hr_dev,
We want clear the value in hns_roce_hw_v2_uninit_instance function.
So we can ensure no problem in RoCE driver.


static int hns_roce_hw_v2_init_instance(struct hnae3_handle *handle)
{
struct hns_roce_dev *hr_dev;
int ret;

hr_dev = (struct hns_roce_dev *)ib_alloc_device(sizeof(*hr_dev));
if (!hr_dev)
return -ENOMEM;

   ...// other code

handle->priv = hr_dev;

// other code

return 0;

error_xxx:
handle->priv = NULL;
...// other code

error_:
ib_dealloc_device(_dev->ib_dev);

return ret;
}

static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
   bool reset)
{
struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;

if (!hr_dev)
return;

hns_roce_exit(hr_dev);
handle->priv = NULL;
kfree(hr_dev->priv);
ib_dealloc_device(_dev->ib_dev);
}

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>




Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-17 Thread Jason Gunthorpe
On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index 86ef15f..e1c44a6 100644
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
>   int ret = 0;
>   int ntc;
>  
> + if (hr_dev->is_reset)
> + return 0;
> +
>   spin_lock_bh(>lock);
>  
>   if (num > hns_roce_cmq_space(csq)) {
> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
> hnae3_handle *handle)
>   return 0;
>  
>  error_failed_get_cfg:
> + handle->priv = NULL;
>   kfree(hr_dev->priv);
>  
>  error_failed_kzalloc:
> @@ -4803,14 +4807,70 @@ static void hns_roce_hw_v2_uninit_instance(struct 
> hnae3_handle *handle,
>  {
>   struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>  
> + if (!hr_dev)
> + return;
> +
>   hns_roce_exit(hr_dev);
> + handle->priv = NULL;
>   kfree(hr_dev->priv);
>   ib_dealloc_device(_dev->ib_dev);
>  }

Why are these hunks here? If init fails then uninit should not be
called, so why meddle with priv?

Jason


Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

2018-05-17 Thread Jason Gunthorpe
On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c 
> b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> index 86ef15f..e1c44a6 100644
> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
>   int ret = 0;
>   int ntc;
>  
> + if (hr_dev->is_reset)
> + return 0;
> +
>   spin_lock_bh(>lock);
>  
>   if (num > hns_roce_cmq_space(csq)) {
> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct 
> hnae3_handle *handle)
>   return 0;
>  
>  error_failed_get_cfg:
> + handle->priv = NULL;
>   kfree(hr_dev->priv);
>  
>  error_failed_kzalloc:
> @@ -4803,14 +4807,70 @@ static void hns_roce_hw_v2_uninit_instance(struct 
> hnae3_handle *handle,
>  {
>   struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>  
> + if (!hr_dev)
> + return;
> +
>   hns_roce_exit(hr_dev);
> + handle->priv = NULL;
>   kfree(hr_dev->priv);
>   ib_dealloc_device(_dev->ib_dev);
>  }

Why are these hunks here? If init fails then uninit should not be
called, so why meddle with priv?

Jason