Re: [ovs-dev] [PATCH net] net: openvswitch: fix upcall counter access before allocation

2023-06-06 Thread Aaron Conole
Eelco Chaudron  writes:

> Currently, the per cpu upcall counters are allocated after the vport is
> created and inserted into the system. This could lead to the datapath
> accessing the counters before they are allocated resulting in a kernel
> Oops.
>
> Here is an example:
>
>   PID: 59693TASK: 0005f4f51500  CPU: 0COMMAND: "ovs-vswitchd"
>#0 [8a39b5b0] __switch_to at b70f0629f2f4
>#1 [8a39b5d0] __schedule at b70f0629f5cc
>#2 [8a39b650] preempt_schedule_common at b70f0629fa60
>#3 [8a39b670] dynamic_might_resched at b70f0629fb58
>#4 [8a39b680] mutex_lock_killable at b70f062a1388
>#5 [8a39b6a0] pcpu_alloc at b70f0594460c
>#6 [8a39b750] __alloc_percpu_gfp at b70f05944e68
>#7 [8a39b760] ovs_vport_cmd_new at b70ee6961b90 [openvswitch]
>...
>
>   PID: 58682TASK: 0005b2f0bf00  CPU: 0COMMAND: "kworker/0:3"
>#0 [8a5d2f40] machine_kexec at b70f056a0758
>#1 [8a5d2f70] __crash_kexec at b70f057e2994
>#2 [8a5d3100] crash_kexec at b70f057e2ad8
>#3 [8a5d3120] die at b70f0628234c
>#4 [8a5d31e0] die_kernel_fault at b70f062828a8
>#5 [8a5d3210] __do_kernel_fault at b70f056a31f4
>#6 [8a5d3240] do_bad_area at b70f056a32a4
>#7 [8a5d3260] do_translation_fault at b70f062a9710
>#8 [8a5d3270] do_mem_abort at b70f056a2f74
>#9 [8a5d32a0] el1_abort at b70f06297dac
>   #10 [8a5d32d0] el1h_64_sync_handler at b70f06299b24
>   #11 [8a5d3410] el1h_64_sync at b70f056812dc
>   #12 [8a5d3430] ovs_dp_upcall at b70ee6963c84 [openvswitch]
>   #13 [8a5d3470] ovs_dp_process_packet at b70ee6963fdc 
> [openvswitch]
>   #14 [8a5d34f0] ovs_vport_receive at b70ee6972c78 [openvswitch]
>   #15 [8a5d36f0] netdev_port_receive at b70ee6973948 [openvswitch]
>   #16 [8a5d3720] netdev_frame_hook at b70ee6973a28 [openvswitch]
>   #17 [8a5d3730] __netif_receive_skb_core.constprop.0 at 
> b70f06079f90
>
> We moved the per cpu upcall counter allocation to the existing vport
> alloc and free functions to solve this.
>
> Fixes: 95637d91fefd ("net: openvswitch: release vport resources on failure")
> Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets")
> Signed-off-by: Eelco Chaudron 
> ---

Acked-by: Aaron Conole 

This is a particularly difficult one to reproduce.  Thanks for posting
the fix.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] net: openvswitch: fix upcall counter access before allocation

2023-06-06 Thread Eelco Chaudron



On 6 Jun 2023, at 13:43, Paolo Abeni wrote:

> On Mon, 2023-06-05 at 16:38 +0200, Simon Horman wrote:
>> On Mon, Jun 05, 2023 at 03:53:59PM +0200, Eelco Chaudron wrote:
 Yeah, I see that. And I might have done the same thing.
 But, OTOH, this change is making the error path more complex
 (or at least more prone to error).

 In any case, the fix looks good.
>>>
>>> Thanks, just to clarify if we get no further feedback on this
>>> patch, do you prefer a v2 with the error path changes?
>>
>> Thanks Eelco,
>>
>> Yes, that is my preference.
>
> I concur with Simon: Eelco, please post a v2 including the error path
> changes.

Our mails crossed, patch should be in your inbox ;)

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] net: openvswitch: fix upcall counter access before allocation

2023-06-06 Thread Simon Horman
On Mon, Jun 05, 2023 at 03:53:59PM +0200, Eelco Chaudron wrote:
> 
> 
> On 5 Jun 2023, at 15:07, Simon Horman wrote:
> 
> > On Mon, Jun 05, 2023 at 02:54:35PM +0200, Eelco Chaudron wrote:
> >>
> >>
> >> On 5 Jun 2023, at 14:41, Simon Horman wrote:
> >>
> >>> On Mon, Jun 05, 2023 at 10:59:50AM +0200, Eelco Chaudron wrote:
>  Currently, the per cpu upcall counters are allocated after the vport is
>  created and inserted into the system. This could lead to the datapath
>  accessing the counters before they are allocated resulting in a kernel
>  Oops.
> 
>  Here is an example:
> 
>    PID: 59693TASK: 0005f4f51500  CPU: 0COMMAND: "ovs-vswitchd"
> #0 [8a39b5b0] __switch_to at b70f0629f2f4
> #1 [8a39b5d0] __schedule at b70f0629f5cc
> #2 [8a39b650] preempt_schedule_common at b70f0629fa60
> #3 [8a39b670] dynamic_might_resched at b70f0629fb58
> #4 [8a39b680] mutex_lock_killable at b70f062a1388
> #5 [8a39b6a0] pcpu_alloc at b70f0594460c
> #6 [8a39b750] __alloc_percpu_gfp at b70f05944e68
> #7 [8a39b760] ovs_vport_cmd_new at b70ee6961b90 
>  [openvswitch]
> ...
> 
>    PID: 58682TASK: 0005b2f0bf00  CPU: 0COMMAND: "kworker/0:3"
> #0 [8a5d2f40] machine_kexec at b70f056a0758
> #1 [8a5d2f70] __crash_kexec at b70f057e2994
> #2 [8a5d3100] crash_kexec at b70f057e2ad8
> #3 [8a5d3120] die at b70f0628234c
> #4 [8a5d31e0] die_kernel_fault at b70f062828a8
> #5 [8a5d3210] __do_kernel_fault at b70f056a31f4
> #6 [8a5d3240] do_bad_area at b70f056a32a4
> #7 [8a5d3260] do_translation_fault at b70f062a9710
> #8 [8a5d3270] do_mem_abort at b70f056a2f74
> #9 [8a5d32a0] el1_abort at b70f06297dac
>    #10 [8a5d32d0] el1h_64_sync_handler at b70f06299b24
>    #11 [8a5d3410] el1h_64_sync at b70f056812dc
>    #12 [8a5d3430] ovs_dp_upcall at b70ee6963c84 [openvswitch]
>    #13 [8a5d3470] ovs_dp_process_packet at b70ee6963fdc 
>  [openvswitch]
>    #14 [8a5d34f0] ovs_vport_receive at b70ee6972c78 
>  [openvswitch]
>    #15 [8a5d36f0] netdev_port_receive at b70ee6973948 
>  [openvswitch]
>    #16 [8a5d3720] netdev_frame_hook at b70ee6973a28 
>  [openvswitch]
>    #17 [8a5d3730] __netif_receive_skb_core.constprop.0 at 
>  b70f06079f90
> 
>  We moved the per cpu upcall counter allocation to the existing vport
>  alloc and free functions to solve this.
> 
>  Fixes: 95637d91fefd ("net: openvswitch: release vport resources on 
>  failure")
>  Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall 
>  packets")
>  Signed-off-by: Eelco Chaudron 
>  ---
>   net/openvswitch/datapath.c |   19 ---
>   net/openvswitch/vport.c|8 
>   2 files changed, 8 insertions(+), 19 deletions(-)
> 
>  diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
>  index fcee6012293b..58f530f60172 100644
>  --- a/net/openvswitch/datapath.c
>  +++ b/net/openvswitch/datapath.c
>  @@ -236,9 +236,6 @@ void ovs_dp_detach_port(struct vport *p)
>   /* First drop references to device. */
>   hlist_del_rcu(>dp_hash_node);
> 
>  -/* Free percpu memory */
>  -free_percpu(p->upcall_stats);
>  -
>   /* Then destroy it. */
>   ovs_vport_del(p);
>   }
>  @@ -1858,12 +1855,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, 
>  struct genl_info *info)
>   goto err_destroy_portids;
>   }
> 
>  -vport->upcall_stats = netdev_alloc_pcpu_stats(struct 
>  vport_upcall_stats_percpu);
>  -if (!vport->upcall_stats) {
>  -err = -ENOMEM;
>  -goto err_destroy_vport;
>  -}
>  -
>   err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
>  info->snd_seq, 0, OVS_DP_CMD_NEW);
>   BUG_ON(err < 0);
>  @@ -1876,8 +1867,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, 
>  struct genl_info *info)
>   ovs_notify(_datapath_genl_family, reply, info);
>   return 0;
> 
>  -err_destroy_vport:
>  -ovs_dp_detach_port(vport);
>   err_destroy_portids:
>   kfree(rcu_dereference_raw(dp->upcall_portids));
>   err_unlock_and_destroy_meters:
>  @@ -2322,12 +2311,6 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
>  struct genl_info *info)
>   goto 

Re: [ovs-dev] [PATCH net] net: openvswitch: fix upcall counter access before allocation

2023-06-06 Thread Paolo Abeni
On Mon, 2023-06-05 at 16:38 +0200, Simon Horman wrote:
> On Mon, Jun 05, 2023 at 03:53:59PM +0200, Eelco Chaudron wrote:
> > > Yeah, I see that. And I might have done the same thing.
> > > But, OTOH, this change is making the error path more complex
> > > (or at least more prone to error).
> > > 
> > > In any case, the fix looks good.
> > 
> > Thanks, just to clarify if we get no further feedback on this
> > patch, do you prefer a v2 with the error path changes?
> 
> Thanks Eelco,
> 
> Yes, that is my preference.

I concur with Simon: Eelco, please post a v2 including the error path
changes.

Thanks!

Paolo

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net] net: openvswitch: fix upcall counter access before allocation

2023-06-05 Thread Eelco Chaudron


On 5 Jun 2023, at 15:07, Simon Horman wrote:

> On Mon, Jun 05, 2023 at 02:54:35PM +0200, Eelco Chaudron wrote:
>>
>>
>> On 5 Jun 2023, at 14:41, Simon Horman wrote:
>>
>>> On Mon, Jun 05, 2023 at 10:59:50AM +0200, Eelco Chaudron wrote:
 Currently, the per cpu upcall counters are allocated after the vport is
 created and inserted into the system. This could lead to the datapath
 accessing the counters before they are allocated resulting in a kernel
 Oops.

 Here is an example:

   PID: 59693TASK: 0005f4f51500  CPU: 0COMMAND: "ovs-vswitchd"
#0 [8a39b5b0] __switch_to at b70f0629f2f4
#1 [8a39b5d0] __schedule at b70f0629f5cc
#2 [8a39b650] preempt_schedule_common at b70f0629fa60
#3 [8a39b670] dynamic_might_resched at b70f0629fb58
#4 [8a39b680] mutex_lock_killable at b70f062a1388
#5 [8a39b6a0] pcpu_alloc at b70f0594460c
#6 [8a39b750] __alloc_percpu_gfp at b70f05944e68
#7 [8a39b760] ovs_vport_cmd_new at b70ee6961b90 
 [openvswitch]
...

   PID: 58682TASK: 0005b2f0bf00  CPU: 0COMMAND: "kworker/0:3"
#0 [8a5d2f40] machine_kexec at b70f056a0758
#1 [8a5d2f70] __crash_kexec at b70f057e2994
#2 [8a5d3100] crash_kexec at b70f057e2ad8
#3 [8a5d3120] die at b70f0628234c
#4 [8a5d31e0] die_kernel_fault at b70f062828a8
#5 [8a5d3210] __do_kernel_fault at b70f056a31f4
#6 [8a5d3240] do_bad_area at b70f056a32a4
#7 [8a5d3260] do_translation_fault at b70f062a9710
#8 [8a5d3270] do_mem_abort at b70f056a2f74
#9 [8a5d32a0] el1_abort at b70f06297dac
   #10 [8a5d32d0] el1h_64_sync_handler at b70f06299b24
   #11 [8a5d3410] el1h_64_sync at b70f056812dc
   #12 [8a5d3430] ovs_dp_upcall at b70ee6963c84 [openvswitch]
   #13 [8a5d3470] ovs_dp_process_packet at b70ee6963fdc 
 [openvswitch]
   #14 [8a5d34f0] ovs_vport_receive at b70ee6972c78 
 [openvswitch]
   #15 [8a5d36f0] netdev_port_receive at b70ee6973948 
 [openvswitch]
   #16 [8a5d3720] netdev_frame_hook at b70ee6973a28 
 [openvswitch]
   #17 [8a5d3730] __netif_receive_skb_core.constprop.0 at 
 b70f06079f90

 We moved the per cpu upcall counter allocation to the existing vport
 alloc and free functions to solve this.

 Fixes: 95637d91fefd ("net: openvswitch: release vport resources on 
 failure")
 Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall 
 packets")
 Signed-off-by: Eelco Chaudron 
 ---
  net/openvswitch/datapath.c |   19 ---
  net/openvswitch/vport.c|8 
  2 files changed, 8 insertions(+), 19 deletions(-)

 diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
 index fcee6012293b..58f530f60172 100644
 --- a/net/openvswitch/datapath.c
 +++ b/net/openvswitch/datapath.c
 @@ -236,9 +236,6 @@ void ovs_dp_detach_port(struct vport *p)
/* First drop references to device. */
hlist_del_rcu(>dp_hash_node);

 -  /* Free percpu memory */
 -  free_percpu(p->upcall_stats);
 -
/* Then destroy it. */
ovs_vport_del(p);
  }
 @@ -1858,12 +1855,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, 
 struct genl_info *info)
goto err_destroy_portids;
}

 -  vport->upcall_stats = netdev_alloc_pcpu_stats(struct 
 vport_upcall_stats_percpu);
 -  if (!vport->upcall_stats) {
 -  err = -ENOMEM;
 -  goto err_destroy_vport;
 -  }
 -
err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
   info->snd_seq, 0, OVS_DP_CMD_NEW);
BUG_ON(err < 0);
 @@ -1876,8 +1867,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, 
 struct genl_info *info)
ovs_notify(_datapath_genl_family, reply, info);
return 0;

 -err_destroy_vport:
 -  ovs_dp_detach_port(vport);
  err_destroy_portids:
kfree(rcu_dereference_raw(dp->upcall_portids));
  err_unlock_and_destroy_meters:
 @@ -2322,12 +2311,6 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
 struct genl_info *info)
goto exit_unlock_free;
}

 -  vport->upcall_stats = netdev_alloc_pcpu_stats(struct 
 vport_upcall_stats_percpu);
 -  if (!vport->upcall_stats) {
 -  err = -ENOMEM;
 -  goto exit_unlock_free_vport;
 -  }
 -
err = ovs_vport_cmd_fill_info(vport, reply, genl_info_net(info),
  info->snd_portid, info->snd_seq, 0,
  

Re: [ovs-dev] [PATCH net] net: openvswitch: fix upcall counter access before allocation

2023-06-05 Thread Simon Horman
On Mon, Jun 05, 2023 at 02:54:35PM +0200, Eelco Chaudron wrote:
> 
> 
> On 5 Jun 2023, at 14:41, Simon Horman wrote:
> 
> > On Mon, Jun 05, 2023 at 10:59:50AM +0200, Eelco Chaudron wrote:
> >> Currently, the per cpu upcall counters are allocated after the vport is
> >> created and inserted into the system. This could lead to the datapath
> >> accessing the counters before they are allocated resulting in a kernel
> >> Oops.
> >>
> >> Here is an example:
> >>
> >>   PID: 59693TASK: 0005f4f51500  CPU: 0COMMAND: "ovs-vswitchd"
> >>#0 [8a39b5b0] __switch_to at b70f0629f2f4
> >>#1 [8a39b5d0] __schedule at b70f0629f5cc
> >>#2 [8a39b650] preempt_schedule_common at b70f0629fa60
> >>#3 [8a39b670] dynamic_might_resched at b70f0629fb58
> >>#4 [8a39b680] mutex_lock_killable at b70f062a1388
> >>#5 [8a39b6a0] pcpu_alloc at b70f0594460c
> >>#6 [8a39b750] __alloc_percpu_gfp at b70f05944e68
> >>#7 [8a39b760] ovs_vport_cmd_new at b70ee6961b90 
> >> [openvswitch]
> >>...
> >>
> >>   PID: 58682TASK: 0005b2f0bf00  CPU: 0COMMAND: "kworker/0:3"
> >>#0 [8a5d2f40] machine_kexec at b70f056a0758
> >>#1 [8a5d2f70] __crash_kexec at b70f057e2994
> >>#2 [8a5d3100] crash_kexec at b70f057e2ad8
> >>#3 [8a5d3120] die at b70f0628234c
> >>#4 [8a5d31e0] die_kernel_fault at b70f062828a8
> >>#5 [8a5d3210] __do_kernel_fault at b70f056a31f4
> >>#6 [8a5d3240] do_bad_area at b70f056a32a4
> >>#7 [8a5d3260] do_translation_fault at b70f062a9710
> >>#8 [8a5d3270] do_mem_abort at b70f056a2f74
> >>#9 [8a5d32a0] el1_abort at b70f06297dac
> >>   #10 [8a5d32d0] el1h_64_sync_handler at b70f06299b24
> >>   #11 [8a5d3410] el1h_64_sync at b70f056812dc
> >>   #12 [8a5d3430] ovs_dp_upcall at b70ee6963c84 [openvswitch]
> >>   #13 [8a5d3470] ovs_dp_process_packet at b70ee6963fdc 
> >> [openvswitch]
> >>   #14 [8a5d34f0] ovs_vport_receive at b70ee6972c78 
> >> [openvswitch]
> >>   #15 [8a5d36f0] netdev_port_receive at b70ee6973948 
> >> [openvswitch]
> >>   #16 [8a5d3720] netdev_frame_hook at b70ee6973a28 
> >> [openvswitch]
> >>   #17 [8a5d3730] __netif_receive_skb_core.constprop.0 at 
> >> b70f06079f90
> >>
> >> We moved the per cpu upcall counter allocation to the existing vport
> >> alloc and free functions to solve this.
> >>
> >> Fixes: 95637d91fefd ("net: openvswitch: release vport resources on 
> >> failure")
> >> Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall 
> >> packets")
> >> Signed-off-by: Eelco Chaudron 
> >> ---
> >>  net/openvswitch/datapath.c |   19 ---
> >>  net/openvswitch/vport.c|8 
> >>  2 files changed, 8 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> >> index fcee6012293b..58f530f60172 100644
> >> --- a/net/openvswitch/datapath.c
> >> +++ b/net/openvswitch/datapath.c
> >> @@ -236,9 +236,6 @@ void ovs_dp_detach_port(struct vport *p)
> >>/* First drop references to device. */
> >>hlist_del_rcu(>dp_hash_node);
> >>
> >> -  /* Free percpu memory */
> >> -  free_percpu(p->upcall_stats);
> >> -
> >>/* Then destroy it. */
> >>ovs_vport_del(p);
> >>  }
> >> @@ -1858,12 +1855,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, 
> >> struct genl_info *info)
> >>goto err_destroy_portids;
> >>}
> >>
> >> -  vport->upcall_stats = netdev_alloc_pcpu_stats(struct 
> >> vport_upcall_stats_percpu);
> >> -  if (!vport->upcall_stats) {
> >> -  err = -ENOMEM;
> >> -  goto err_destroy_vport;
> >> -  }
> >> -
> >>err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
> >>   info->snd_seq, 0, OVS_DP_CMD_NEW);
> >>BUG_ON(err < 0);
> >> @@ -1876,8 +1867,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, 
> >> struct genl_info *info)
> >>ovs_notify(_datapath_genl_family, reply, info);
> >>return 0;
> >>
> >> -err_destroy_vport:
> >> -  ovs_dp_detach_port(vport);
> >>  err_destroy_portids:
> >>kfree(rcu_dereference_raw(dp->upcall_portids));
> >>  err_unlock_and_destroy_meters:
> >> @@ -2322,12 +2311,6 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
> >> struct genl_info *info)
> >>goto exit_unlock_free;
> >>}
> >>
> >> -  vport->upcall_stats = netdev_alloc_pcpu_stats(struct 
> >> vport_upcall_stats_percpu);
> >> -  if (!vport->upcall_stats) {
> >> -  err = -ENOMEM;
> >> -  goto exit_unlock_free_vport;
> >> -  }
> >> -
> >>err = ovs_vport_cmd_fill_info(vport, reply, genl_info_net(info),
> >>  info->snd_portid, info->snd_seq, 0,
> >>  OVS_VPORT_CMD_NEW, 

Re: [ovs-dev] [PATCH net] net: openvswitch: fix upcall counter access before allocation

2023-06-05 Thread Eelco Chaudron


On 5 Jun 2023, at 14:41, Simon Horman wrote:

> On Mon, Jun 05, 2023 at 10:59:50AM +0200, Eelco Chaudron wrote:
>> Currently, the per cpu upcall counters are allocated after the vport is
>> created and inserted into the system. This could lead to the datapath
>> accessing the counters before they are allocated resulting in a kernel
>> Oops.
>>
>> Here is an example:
>>
>>   PID: 59693TASK: 0005f4f51500  CPU: 0COMMAND: "ovs-vswitchd"
>>#0 [8a39b5b0] __switch_to at b70f0629f2f4
>>#1 [8a39b5d0] __schedule at b70f0629f5cc
>>#2 [8a39b650] preempt_schedule_common at b70f0629fa60
>>#3 [8a39b670] dynamic_might_resched at b70f0629fb58
>>#4 [8a39b680] mutex_lock_killable at b70f062a1388
>>#5 [8a39b6a0] pcpu_alloc at b70f0594460c
>>#6 [8a39b750] __alloc_percpu_gfp at b70f05944e68
>>#7 [8a39b760] ovs_vport_cmd_new at b70ee6961b90 [openvswitch]
>>...
>>
>>   PID: 58682TASK: 0005b2f0bf00  CPU: 0COMMAND: "kworker/0:3"
>>#0 [8a5d2f40] machine_kexec at b70f056a0758
>>#1 [8a5d2f70] __crash_kexec at b70f057e2994
>>#2 [8a5d3100] crash_kexec at b70f057e2ad8
>>#3 [8a5d3120] die at b70f0628234c
>>#4 [8a5d31e0] die_kernel_fault at b70f062828a8
>>#5 [8a5d3210] __do_kernel_fault at b70f056a31f4
>>#6 [8a5d3240] do_bad_area at b70f056a32a4
>>#7 [8a5d3260] do_translation_fault at b70f062a9710
>>#8 [8a5d3270] do_mem_abort at b70f056a2f74
>>#9 [8a5d32a0] el1_abort at b70f06297dac
>>   #10 [8a5d32d0] el1h_64_sync_handler at b70f06299b24
>>   #11 [8a5d3410] el1h_64_sync at b70f056812dc
>>   #12 [8a5d3430] ovs_dp_upcall at b70ee6963c84 [openvswitch]
>>   #13 [8a5d3470] ovs_dp_process_packet at b70ee6963fdc 
>> [openvswitch]
>>   #14 [8a5d34f0] ovs_vport_receive at b70ee6972c78 [openvswitch]
>>   #15 [8a5d36f0] netdev_port_receive at b70ee6973948 
>> [openvswitch]
>>   #16 [8a5d3720] netdev_frame_hook at b70ee6973a28 [openvswitch]
>>   #17 [8a5d3730] __netif_receive_skb_core.constprop.0 at 
>> b70f06079f90
>>
>> We moved the per cpu upcall counter allocation to the existing vport
>> alloc and free functions to solve this.
>>
>> Fixes: 95637d91fefd ("net: openvswitch: release vport resources on failure")
>> Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets")
>> Signed-off-by: Eelco Chaudron 
>> ---
>>  net/openvswitch/datapath.c |   19 ---
>>  net/openvswitch/vport.c|8 
>>  2 files changed, 8 insertions(+), 19 deletions(-)
>>
>> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
>> index fcee6012293b..58f530f60172 100644
>> --- a/net/openvswitch/datapath.c
>> +++ b/net/openvswitch/datapath.c
>> @@ -236,9 +236,6 @@ void ovs_dp_detach_port(struct vport *p)
>>  /* First drop references to device. */
>>  hlist_del_rcu(>dp_hash_node);
>>
>> -/* Free percpu memory */
>> -free_percpu(p->upcall_stats);
>> -
>>  /* Then destroy it. */
>>  ovs_vport_del(p);
>>  }
>> @@ -1858,12 +1855,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
>> genl_info *info)
>>  goto err_destroy_portids;
>>  }
>>
>> -vport->upcall_stats = netdev_alloc_pcpu_stats(struct 
>> vport_upcall_stats_percpu);
>> -if (!vport->upcall_stats) {
>> -err = -ENOMEM;
>> -goto err_destroy_vport;
>> -}
>> -
>>  err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
>> info->snd_seq, 0, OVS_DP_CMD_NEW);
>>  BUG_ON(err < 0);
>> @@ -1876,8 +1867,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
>> genl_info *info)
>>  ovs_notify(_datapath_genl_family, reply, info);
>>  return 0;
>>
>> -err_destroy_vport:
>> -ovs_dp_detach_port(vport);
>>  err_destroy_portids:
>>  kfree(rcu_dereference_raw(dp->upcall_portids));
>>  err_unlock_and_destroy_meters:
>> @@ -2322,12 +2311,6 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
>> struct genl_info *info)
>>  goto exit_unlock_free;
>>  }
>>
>> -vport->upcall_stats = netdev_alloc_pcpu_stats(struct 
>> vport_upcall_stats_percpu);
>> -if (!vport->upcall_stats) {
>> -err = -ENOMEM;
>> -goto exit_unlock_free_vport;
>> -}
>> -
>>  err = ovs_vport_cmd_fill_info(vport, reply, genl_info_net(info),
>>info->snd_portid, info->snd_seq, 0,
>>OVS_VPORT_CMD_NEW, GFP_KERNEL);
>> @@ -2345,8 +2328,6 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
>> struct genl_info *info)
>>  ovs_notify(_vport_genl_family, reply, info);
>>  return 0;
>>
>> -exit_unlock_free_vport:
>> -ovs_dp_detach_port(vport);
>> 

Re: [ovs-dev] [PATCH net] net: openvswitch: fix upcall counter access before allocation

2023-06-05 Thread Simon Horman
On Mon, Jun 05, 2023 at 10:59:50AM +0200, Eelco Chaudron wrote:
> Currently, the per cpu upcall counters are allocated after the vport is
> created and inserted into the system. This could lead to the datapath
> accessing the counters before they are allocated resulting in a kernel
> Oops.
> 
> Here is an example:
> 
>   PID: 59693TASK: 0005f4f51500  CPU: 0COMMAND: "ovs-vswitchd"
>#0 [8a39b5b0] __switch_to at b70f0629f2f4
>#1 [8a39b5d0] __schedule at b70f0629f5cc
>#2 [8a39b650] preempt_schedule_common at b70f0629fa60
>#3 [8a39b670] dynamic_might_resched at b70f0629fb58
>#4 [8a39b680] mutex_lock_killable at b70f062a1388
>#5 [8a39b6a0] pcpu_alloc at b70f0594460c
>#6 [8a39b750] __alloc_percpu_gfp at b70f05944e68
>#7 [8a39b760] ovs_vport_cmd_new at b70ee6961b90 [openvswitch]
>...
> 
>   PID: 58682TASK: 0005b2f0bf00  CPU: 0COMMAND: "kworker/0:3"
>#0 [8a5d2f40] machine_kexec at b70f056a0758
>#1 [8a5d2f70] __crash_kexec at b70f057e2994
>#2 [8a5d3100] crash_kexec at b70f057e2ad8
>#3 [8a5d3120] die at b70f0628234c
>#4 [8a5d31e0] die_kernel_fault at b70f062828a8
>#5 [8a5d3210] __do_kernel_fault at b70f056a31f4
>#6 [8a5d3240] do_bad_area at b70f056a32a4
>#7 [8a5d3260] do_translation_fault at b70f062a9710
>#8 [8a5d3270] do_mem_abort at b70f056a2f74
>#9 [8a5d32a0] el1_abort at b70f06297dac
>   #10 [8a5d32d0] el1h_64_sync_handler at b70f06299b24
>   #11 [8a5d3410] el1h_64_sync at b70f056812dc
>   #12 [8a5d3430] ovs_dp_upcall at b70ee6963c84 [openvswitch]
>   #13 [8a5d3470] ovs_dp_process_packet at b70ee6963fdc 
> [openvswitch]
>   #14 [8a5d34f0] ovs_vport_receive at b70ee6972c78 [openvswitch]
>   #15 [8a5d36f0] netdev_port_receive at b70ee6973948 [openvswitch]
>   #16 [8a5d3720] netdev_frame_hook at b70ee6973a28 [openvswitch]
>   #17 [8a5d3730] __netif_receive_skb_core.constprop.0 at 
> b70f06079f90
> 
> We moved the per cpu upcall counter allocation to the existing vport
> alloc and free functions to solve this.
> 
> Fixes: 95637d91fefd ("net: openvswitch: release vport resources on failure")
> Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets")
> Signed-off-by: Eelco Chaudron 
> ---
>  net/openvswitch/datapath.c |   19 ---
>  net/openvswitch/vport.c|8 
>  2 files changed, 8 insertions(+), 19 deletions(-)
> 
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index fcee6012293b..58f530f60172 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -236,9 +236,6 @@ void ovs_dp_detach_port(struct vport *p)
>   /* First drop references to device. */
>   hlist_del_rcu(>dp_hash_node);
>  
> - /* Free percpu memory */
> - free_percpu(p->upcall_stats);
> -
>   /* Then destroy it. */
>   ovs_vport_del(p);
>  }
> @@ -1858,12 +1855,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   goto err_destroy_portids;
>   }
>  
> - vport->upcall_stats = netdev_alloc_pcpu_stats(struct 
> vport_upcall_stats_percpu);
> - if (!vport->upcall_stats) {
> - err = -ENOMEM;
> - goto err_destroy_vport;
> - }
> -
>   err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
>  info->snd_seq, 0, OVS_DP_CMD_NEW);
>   BUG_ON(err < 0);
> @@ -1876,8 +1867,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   ovs_notify(_datapath_genl_family, reply, info);
>   return 0;
>  
> -err_destroy_vport:
> - ovs_dp_detach_port(vport);
>  err_destroy_portids:
>   kfree(rcu_dereference_raw(dp->upcall_portids));
>  err_unlock_and_destroy_meters:
> @@ -2322,12 +2311,6 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
> struct genl_info *info)
>   goto exit_unlock_free;
>   }
>  
> - vport->upcall_stats = netdev_alloc_pcpu_stats(struct 
> vport_upcall_stats_percpu);
> - if (!vport->upcall_stats) {
> - err = -ENOMEM;
> - goto exit_unlock_free_vport;
> - }
> -
>   err = ovs_vport_cmd_fill_info(vport, reply, genl_info_net(info),
> info->snd_portid, info->snd_seq, 0,
> OVS_VPORT_CMD_NEW, GFP_KERNEL);
> @@ -2345,8 +2328,6 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
> struct genl_info *info)
>   ovs_notify(_vport_genl_family, reply, info);
>   return 0;
>  
> -exit_unlock_free_vport:
> - ovs_dp_detach_port(vport);
>  exit_unlock_free:
>   ovs_unlock();
>   kfree_skb(reply);
> diff --git a/net/openvswitch/vport.c