Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-27 Thread Ilya Maximets
On 2/22/23 18:07, Aaron Conole wrote:
> Open vSwitch generally tries to let the underlying operating system
> managed the low level details of hardware, for example DMA mapping,
> bus arbitration, etc.  However, when using DPDK, the underlying
> operating system yields control of many of these details to userspace
> for management.
> 
> In the case of some DPDK port drivers, configuring rte_flow or even
> allocating resources may require access to iopl/ioperm calls, which
> are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
> calls are dangerous, and can allow a process to completely compromise
> a system.  However, they are needed in the case of some userspace
> driver code which manages the hardware (for example, the mlx
> implementation of backend support for rte_flow).
> 
> Here, we create an opt-in flag passed to the command line to allow
> this access.  We need to do this before ever accessing the database,
> because we want to drop all privileges asap, and cannot wait for
> a connection to the database to be established and functional before
> dropping.  There may be distribution specific ways to do capability
> management as well (using for example, systemd), but they are not
> as universal to the vswitchd as a flag.
> 
> Signed-off-by: Aaron Conole 
> ---
>  NEWS   |  4 
>  lib/daemon-unix.c  | 31 ++-
>  lib/daemon.c   |  2 +-
>  lib/daemon.h   |  4 ++--
>  ovsdb/ovsdb-client.c   |  6 +++---
>  ovsdb/ovsdb-server.c   |  4 ++--
>  tests/test-netflow.c   |  2 +-
>  tests/test-sflow.c |  2 +-
>  tests/test-unixctl.c   |  2 +-
>  utilities/ovs-ofctl.c  |  4 ++--
>  utilities/ovs-testcontroller.c |  4 ++--
>  vswitchd/ovs-vswitchd.8.in |  8 
>  vswitchd/ovs-vswitchd.c| 11 ++-
>  13 files changed, 59 insertions(+), 25 deletions(-)
> 

...

> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> index 407bfc60eb..f62d1ad751 100644
> --- a/vswitchd/ovs-vswitchd.c
> +++ b/vswitchd/ovs-vswitchd.c
> @@ -60,6 +60,9 @@ VLOG_DEFINE_THIS_MODULE(vswitchd);
>   * the kernel from paging any of its memory to disk. */
>  static bool want_mlockall;
>  
> +/* --hw-access: If set, retains CAP_SYS_RAWIO privileges.  */
> +static bool hw_access;

The comment is outdated.  And we may also want to rename the variable
itself to match the option.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-27 Thread Aaron Conole
Gaetan Rivet  writes:

>> -Original Message-
>> From: Aaron Conole mailto:acon...@redhat.com>>
>> Date: Wednesday 22 February 2023 at 18:11
>> To: "d...@openvswitch.org <mailto:d...@openvswitch.org>"
>> mailto:d...@openvswitch.org>>
>> Cc: Eli Britstein mailto:el...@nvidia.com>>,
>> Gaetan Rivet mailto:gaet...@nvidia.com>>, Ilya
>> Maximets mailto:i.maxim...@ovn.org>>, Maxime
>> Coquelin > <mailto:maxime.coque...@redhat.com>>, Jason Gunthorpe
>> mailto:j...@nvidia.com>>, Majd Dibbiny
>> mailto:m...@nvidia.com>>, David Marchand
>> mailto:david.march...@redhat.com>>
>> Subject: Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges
>>
>> Apologies - I mis-typed Gaetan's email when I entered it into my mail
>> file. CC'd correctly on this email (but I can resend the patch, if you
>> think it is better).
>>
>> Aaron Conole mailto:acon...@redhat.com>> writes:
>>
>> > Open vSwitch generally tries to let the underlying operating system
>> > managed the low level details of hardware, for example DMA mapping,
>> > bus arbitration, etc. However, when using DPDK, the underlying
>> > operating system yields control of many of these details to userspace
>> > for management.
>> >
>> > In the case of some DPDK port drivers, configuring rte_flow or even
>> > allocating resources may require access to iopl/ioperm calls, which
>> > are guarded by the CAP_SYS_RAWIO privilege on linux systems. These
>> > calls are dangerous, and can allow a process to completely compromise
>> > a system. However, they are needed in the case of some userspace
>> > driver code which manages the hardware (for example, the mlx
>> > implementation of backend support for rte_flow).
>> >
>> > Here, we create an opt-in flag passed to the command line to allow
>> > this access. We need to do this before ever accessing the database,
>> > because we want to drop all privileges asap, and cannot wait for
>> > a connection to the database to be established and functional before
>> > dropping. There may be distribution specific ways to do capability
>> > management as well (using for example, systemd), but they are not
>> > as universal to the vswitchd as a flag.
>> >
>> > Signed-off-by: Aaron Conole > > <mailto:acon...@redhat.com>>
>> > ---
>
> Hello Aaron,
>
> Thank you for proposing this change.
>
> If users want to use mlx5 ports with OVS without being root, this capability 
> will be required.
> Adding a vswitchd option to enable it seems the simplest way to offer some 
> control.

Yes, this seems to be the best way as far as I can tell.

> If vendor-specific logic was allowed, I could add a function to detect 
> Mellanox ports
> and enable this option in that case. Otherwise we can document as much as 
> possible,
> but hopefully the errors will be made clear from the DPDK side because it will
> be hard to explain those errors without vendor-specific code.

Unfortunately, there isn't a good place to put such vendor specific
logic.  I thought about it a bit.  The vswitchd can't assume that just
because an mlx5 card is present that it will be used as part of DPDK
port.  We cannot wait until connecting to DB to preserve this
capability, either, since the privileges drops would happen *after* a
stable DB connection and read (which is far too long).

> Regarding the implementation, I had a few comments:

Great!

>> @@ -877,11 +890,11 @@ daemon_become_new_user__(bool access_datapath)
>>   * However, there in case the user switch needs to be done
>>   * before daemonize_start(), the following API can be used.  */
>>  void
>> -daemon_become_new_user(bool access_datapath)
>> +daemon_become_new_user(bool access_datapath, bool access_hardware_ports)
>>  {
>>  assert_single_threaded();
>>  if (switch_user) {
>> -daemon_become_new_user__(access_datapath);
>> +daemon_become_new_user__(access_datapath, access_hardware_ports);
>>  /* daemonize_start() should not switch user again. */
>>  switch_user = false;
>>  }
>
> Grepping for daemon_become_new_user, I see the following that might need a 
> change:
> lib/daemon-windows.c:529:daemon_become_new_user(bool access_datapath 
> OVS_UNUSED)

Yes, I think it needs to change here as well.  My appveyor run for some
reason passed
(https://ci.appveyor.com/project/orgcandman/ovs/builds/46307713) and
that is concerning.

>> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-

Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-23 Thread Gaetan Rivet via dev
> -Original Message-
> From: Aaron Conole mailto:acon...@redhat.com>>
> Date: Wednesday 22 February 2023 at 18:11
> To: "d...@openvswitch.org <mailto:d...@openvswitch.org>" 
> mailto:d...@openvswitch.org>>
> Cc: Eli Britstein mailto:el...@nvidia.com>>, Gaetan Rivet 
> mailto:gaet...@nvidia.com>>, Ilya Maximets 
> mailto:i.maxim...@ovn.org>>, Maxime Coquelin 
> mailto:maxime.coque...@redhat.com>>, Jason  
> Gunthorpe mailto:j...@nvidia.com>>, Majd Dibbiny 
> mailto:m...@nvidia.com>>, David Marchand 
> mailto:david.march...@redhat.com>>
> Subject: Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges
>
> Apologies - I mis-typed Gaetan's email when I entered it into my mail
> file. CC'd correctly on this email (but I can resend the patch, if you
> think it is better).
>
> Aaron Conole mailto:acon...@redhat.com>> writes:
>
> > Open vSwitch generally tries to let the underlying operating system
> > managed the low level details of hardware, for example DMA mapping,
> > bus arbitration, etc. However, when using DPDK, the underlying
> > operating system yields control of many of these details to userspace
> > for management.
> >
> > In the case of some DPDK port drivers, configuring rte_flow or even
> > allocating resources may require access to iopl/ioperm calls, which
> > are guarded by the CAP_SYS_RAWIO privilege on linux systems. These
> > calls are dangerous, and can allow a process to completely compromise
> > a system. However, they are needed in the case of some userspace
> > driver code which manages the hardware (for example, the mlx
> > implementation of backend support for rte_flow).
> >
> > Here, we create an opt-in flag passed to the command line to allow
> > this access. We need to do this before ever accessing the database,
> > because we want to drop all privileges asap, and cannot wait for
> > a connection to the database to be established and functional before
> > dropping. There may be distribution specific ways to do capability
> > management as well (using for example, systemd), but they are not
> > as universal to the vswitchd as a flag.
> >
> > Signed-off-by: Aaron Conole mailto:acon...@redhat.com>>
> > ---

Hello Aaron,

Thank you for proposing this change.

If users want to use mlx5 ports with OVS without being root, this capability 
will be required.
Adding a vswitchd option to enable it seems the simplest way to offer some 
control.

If vendor-specific logic was allowed, I could add a function to detect Mellanox 
ports
and enable this option in that case. Otherwise we can document as much as 
possible,
but hopefully the errors will be made clear from the DPDK side because it will
be hard to explain those errors without vendor-specific code.

Regarding the implementation, I had a few comments:
> @@ -877,11 +890,11 @@ daemon_become_new_user__(bool access_datapath)
>   * However, there in case the user switch needs to be done
>   * before daemonize_start(), the following API can be used.  */
>  void
> -daemon_become_new_user(bool access_datapath)
> +daemon_become_new_user(bool access_datapath, bool access_hardware_ports)
>  {
>  assert_single_threaded();
>  if (switch_user) {
> -daemon_become_new_user__(access_datapath);
> +daemon_become_new_user__(access_datapath, access_hardware_ports);
>  /* daemonize_start() should not switch user again. */
>  switch_user = false;
>  }

Grepping for daemon_become_new_user, I see the following that might need a 
change:
lib/daemon-windows.c:529:daemon_become_new_user(bool access_datapath OVS_UNUSED)

> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> index 407bfc60e..f62d1ad75 100644
> --- a/vswitchd/ovs-vswitchd.c
> +++ b/vswitchd/ovs-vswitchd.c
> @@ -60,6 +60,9 @@ VLOG_DEFINE_THIS_MODULE(vswitchd);
>   * the kernel from paging any of its memory to disk. */
>  static bool want_mlockall;
>
> +/* --hw-access: If set, retains CAP_SYS_RAWIO privileges.  */
> +static bool hw_access;
> +
>  static unixctl_cb_func ovs_vswitchd_exit;
>
>  static char *parse_options(int argc, char *argv[], char **unixctl_path);
> @@ -89,7 +92,7 @@ main(int argc, char *argv[])
>  remote = parse_options(argc, argv, _path);
>  fatal_ignore_sigpipe();
>
> -daemonize_start(true);
> +daemonize_start(true, true);
 
Here I think it should be daemonize_start(true, hw_access);

Best regards,

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-23 Thread Gaetan Rivet via dev
> -Original Message-
> From: Robin Jarry mailto:rja...@redhat.com>>
> Date: Thursday 23 February 2023 at 22:43
> To: Gaetan Rivet mailto:gaet...@nvidia.com>>, Aaron 
> Conole mailto:acon...@redhat.com>>
> Cc: "d...@openvswitch.org <mailto:d...@openvswitch.org>" 
> mailto:d...@openvswitch.org>>, Eli Britstein 
> mailto:el...@nvidia.com>>, Ilya Maximets 
> mailto:i.maxim...@ovn.org>>, Maxime Coquelin 
> mailto:maxime.coque...@redhat.com>>, Jason 
> Gunthorpe mailto:j...@nvidia.com>>, Majd Dibbiny 
> mailto:m...@nvidia.com>>, David Marchand 
> mailto:david.march...@redhat.com>>, Gaetan Rivet 
> mailto:grive@u256.  net>>, Eelco Chaudron 
> mailto:echau...@redhat.com>>
> Subject: Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges
>
>
> Salut Gaëtan,
>
>
> Gaetan Rivet, Feb 23, 2023 at 22:33:
> > I've looked at your patch Robin and the offloads you insert in
> > dpdk_cp_prot_add_flow use the following:
> >
> > const struct rte_flow_attr attr = { .ingress = 1 };
> >
> > implicitly setting transfer and group to 0. If either of those had
> > been non-zero instead, cap_sys_rawio would be required.
>
>
> Oh I was not aware that this would change anything. Is there some
> document/code snippet/anything that explains why is that so? Is that
> specific to the mlx5 driver?
>
>
> Thanks!

You can find some scarce info there: 
https://doc.dpdk.org/guides/platform/mlx5.html#linux-environment
Check out section 5.5.1.5. "Run as Non-Root".

This doc is incomplete, which is one of the RC of these threads.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-23 Thread Robin Jarry
Salut Gaëtan,

Gaetan Rivet, Feb 23, 2023 at 22:33:
> I've looked at your patch Robin and the offloads you insert in
> dpdk_cp_prot_add_flow use the following:
>
> const struct rte_flow_attr attr = { .ingress = 1 };
>
> implicitly setting transfer and group to 0. If either of those had
> been non-zero instead, cap_sys_rawio would be required.

Oh I was not aware that this would change anything. Is there some
document/code snippet/anything that explains why is that so? Is that
specific to the mlx5 driver?

Thanks!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-23 Thread Gaetan Rivet via dev
> -Original Message-
> From: Robin Jarry mailto:rja...@redhat.com>>
> Date: Thursday 23 February 2023 at 22:14
> To: Aaron Conole mailto:acon...@redhat.com>>
> Cc: "d...@openvswitch.org <mailto:d...@openvswitch.org>" 
> mailto:d...@openvswitch.org>>, Eli Britstein 
> mailto:el...@nvidia.com>>, Gaetan Rivet 
> mailto:gaet...@nvidia.com>>, Ilya Maximets 
> >, Maxime Coquelin 
> mailto:maxime.coque...@redhat.com>>, Jason 
> Gunthorpe mailto:j...@nvidia.com>>, Majd Dibbiny 
> mailto:m...@nvidia.com>>, David Marchand 
> mailto:david.march...@redhat.com>>, Gaetan Rivet 
> mailto:gr...@u256.net>>, Eelco Chaudron  <mailto:echau...@redhat.com>>
> Subject: Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges
>
>
> External email: Use caution opening links or attachments
>
>
>
>
> Aaron Conole, Feb 23, 2023 at 22:09:
> > Thanks for taking a look. You're saying that you tested without this
> > patch applied, yes? That could be. I only know of one hardware which
> > requires CAP_SYS_RAWIO for rte_flow to function.
>
>
> Yes that is correct, I tested *without* this patch applied and with
> a non-root user (ovs-vswitchd linked with libcap-ng).
>
>
> ovs-ctl --ovs-user="openvswitch:hugetlbfs" start
>
>
> The basic RTE flow rules (matching of the ether type field and redirect
> to a specific queue) were created without errors returned with both NICs
> I had available (Intel X710 and Mellanox ConnectX-5 Ex)
>
>
> cp-protection: redirected lacp traffic to rx queue 1
> cp-protection: redirected other traffic to rx queue 0

Hello,

I've looked at your patch Robin and the offloads you insert in 
dpdk_cp_prot_add_flow
use the following:

const struct rte_flow_attr attr = { .ingress = 1 };

implicitly setting transfer and group to 0. If either of those had been 
non-zero instead,
cap_sys_rawio would be required.

Otherwise thank you very much Aaron for you patch, I was reading it and will
comment directly to it.

Best regards,
Gaetan 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-23 Thread Robin Jarry
Aaron Conole, Feb 23, 2023 at 22:09:
> Thanks for taking a look.  You're saying that you tested without this
> patch applied, yes?  That could be.  I only know of one hardware which
> requires CAP_SYS_RAWIO for rte_flow to function.

Yes that is correct, I tested *without* this patch applied and with
a non-root user (ovs-vswitchd linked with libcap-ng).

  ovs-ctl --ovs-user="openvswitch:hugetlbfs" start

The basic RTE flow rules (matching of the ether type field and redirect
to a specific queue) were created without errors returned with both NICs
I had available (Intel X710 and Mellanox ConnectX-5 Ex)

 cp-protection: redirected lacp traffic to rx queue 1
 cp-protection: redirected other traffic to rx queue 0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-23 Thread Aaron Conole
"Robin Jarry"  writes:

> Aaron Conole, Feb 22, 2023 at 18:07:
>> Open vSwitch generally tries to let the underlying operating system
>> managed the low level details of hardware, for example DMA mapping,
>> bus arbitration, etc.  However, when using DPDK, the underlying
>> operating system yields control of many of these details to userspace
>> for management.
>>
>> In the case of some DPDK port drivers, configuring rte_flow or even
>> allocating resources may require access to iopl/ioperm calls, which
>> are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
>> calls are dangerous, and can allow a process to completely compromise
>> a system.  However, they are needed in the case of some userspace
>> driver code which manages the hardware (for example, the mlx
>> implementation of backend support for rte_flow).
>>
>> Here, we create an opt-in flag passed to the command line to allow
>> this access.  We need to do this before ever accessing the database,
>> because we want to drop all privileges asap, and cannot wait for
>> a connection to the database to be established and functional before
>> dropping.  There may be distribution specific ways to do capability
>> management as well (using for example, systemd), but they are not
>> as universal to the vswitchd as a flag.
>>
>> Signed-off-by: Aaron Conole 
>
> Hi Aaron,
>
> I briefly tested the injection of a basic RTE flow rule (see my control
> plane protection patch here for more details[1]). With a non-root user,
> there are no permission issues that I can see. I have tested with both
> the i40e (X710) and mlx5 (ConnectX-5 Ex) drivers.

Thanks for taking a look.  You're saying that you tested without this
patch applied, yes?  That could be.  I only know of one hardware which
requires CAP_SYS_RAWIO for rte_flow to function.

> Maybe CAP_SYS_RAWIO is only required for more advanced flows but I am
> surprised that I didn't encounter the issue for neither a vfio-pci based
> driver nor with a bifurcated driver.
>
> [1]: 
> http://patchwork.ozlabs.org/project/openvswitch/patch/20230222154321.23421-1-rja...@redhat.com/

I'm surprised as well.  Maybe someone from Mellanox can comment?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-23 Thread Robin Jarry
Aaron Conole, Feb 22, 2023 at 18:07:
> Open vSwitch generally tries to let the underlying operating system
> managed the low level details of hardware, for example DMA mapping,
> bus arbitration, etc.  However, when using DPDK, the underlying
> operating system yields control of many of these details to userspace
> for management.
>
> In the case of some DPDK port drivers, configuring rte_flow or even
> allocating resources may require access to iopl/ioperm calls, which
> are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
> calls are dangerous, and can allow a process to completely compromise
> a system.  However, they are needed in the case of some userspace
> driver code which manages the hardware (for example, the mlx
> implementation of backend support for rte_flow).
>
> Here, we create an opt-in flag passed to the command line to allow
> this access.  We need to do this before ever accessing the database,
> because we want to drop all privileges asap, and cannot wait for
> a connection to the database to be established and functional before
> dropping.  There may be distribution specific ways to do capability
> management as well (using for example, systemd), but they are not
> as universal to the vswitchd as a flag.
>
> Signed-off-by: Aaron Conole 

Hi Aaron,

I briefly tested the injection of a basic RTE flow rule (see my control
plane protection patch here for more details[1]). With a non-root user,
there are no permission issues that I can see. I have tested with both
the i40e (X710) and mlx5 (ConnectX-5 Ex) drivers.

Maybe CAP_SYS_RAWIO is only required for more advanced flows but I am
surprised that I didn't encounter the issue for neither a vfio-pci based
driver nor with a bifurcated driver.

[1]: 
http://patchwork.ozlabs.org/project/openvswitch/patch/20230222154321.23421-1-rja...@redhat.com/

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-22 Thread 0-day Robot
Bleep bloop.  Greetings Aaron Conole, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 103 characters long (recommended limit is 79)
#112 FILE: lib/daemon-unix.c:838:
VLOG_WARN("hw port access requested, but no userspace 
ioport support.  Dropping.");

Lines checked: 390, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-22 Thread Aaron Conole
Apologies - I mis-typed Gaetan's email when I entered it into my mail
file.  CC'd correctly on this email (but I can resend the patch, if you
think it is better).

Aaron Conole  writes:

> Open vSwitch generally tries to let the underlying operating system
> managed the low level details of hardware, for example DMA mapping,
> bus arbitration, etc.  However, when using DPDK, the underlying
> operating system yields control of many of these details to userspace
> for management.
>
> In the case of some DPDK port drivers, configuring rte_flow or even
> allocating resources may require access to iopl/ioperm calls, which
> are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
> calls are dangerous, and can allow a process to completely compromise
> a system.  However, they are needed in the case of some userspace
> driver code which manages the hardware (for example, the mlx
> implementation of backend support for rte_flow).
>
> Here, we create an opt-in flag passed to the command line to allow
> this access.  We need to do this before ever accessing the database,
> because we want to drop all privileges asap, and cannot wait for
> a connection to the database to be established and functional before
> dropping.  There may be distribution specific ways to do capability
> management as well (using for example, systemd), but they are not
> as universal to the vswitchd as a flag.
>
> Signed-off-by: Aaron Conole 
> ---
>  NEWS   |  4 
>  lib/daemon-unix.c  | 31 ++-
>  lib/daemon.c   |  2 +-
>  lib/daemon.h   |  4 ++--
>  ovsdb/ovsdb-client.c   |  6 +++---
>  ovsdb/ovsdb-server.c   |  4 ++--
>  tests/test-netflow.c   |  2 +-
>  tests/test-sflow.c |  2 +-
>  tests/test-unixctl.c   |  2 +-
>  utilities/ovs-ofctl.c  |  4 ++--
>  utilities/ovs-testcontroller.c |  4 ++--
>  vswitchd/ovs-vswitchd.8.in |  8 
>  vswitchd/ovs-vswitchd.c| 11 ++-
>  13 files changed, 59 insertions(+), 25 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 85b3496214..65f35dcdd5 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,6 +10,10 @@ Post-v3.1.0
> in order to create OVSDB sockets with access mode of 0770.
> - QoS:
>   * Added new configuration option 'jitter' for a linux-netem QoS type.
> +   - DPDK:
> + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started
> +   with the --hw-rawio-access command line option.  This allows the
> +   process extra privileges when mapping physical interconnect memory.
>  
>  
>  v3.1.0 - 16 Feb 2023
> diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
> index 1a7ba427d7..8b895a48de 100644
> --- a/lib/daemon-unix.c
> +++ b/lib/daemon-unix.c
> @@ -88,7 +88,8 @@ static bool switch_user = false;
>  static uid_t uid;
>  static gid_t gid;
>  static char *user = NULL;
> -static void daemon_become_new_user__(bool access_datapath);
> +static void daemon_become_new_user__(bool access_datapath,
> + bool access_hardware_ports);
>  
>  static void check_already_running(void);
>  static int lock_pidfile(FILE *, int command);
> @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid)
>   * daemonize_complete()) or that it failed to start up (by exiting with a
>   * nonzero exit code). */
>  void
> -daemonize_start(bool access_datapath)
> +daemonize_start(bool access_datapath, bool access_hardware_ports)
>  {
>  assert_single_threaded();
>  daemonize_fd = -1;
>  
>  if (switch_user) {
> -daemon_become_new_user__(access_datapath);
> +daemon_become_new_user__(access_datapath, access_hardware_ports);
>  switch_user = false;
>  }
>  
> @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void)
>  /* Linux specific implementation of daemon_become_new_user()
>   * using libcap-ng.   */
>  static void
> -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED)
> +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED,
> + bool access_hardware_ports OVS_UNUSED)
>  {
>  #if defined __linux__ &&  HAVE_LIBCAPNG
>  int ret;
> @@ -826,7 +828,17 @@ daemon_become_new_user_linux(bool access_datapath 
> OVS_UNUSED)
>  if (access_datapath && !ret) {
>  ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN)
>|| capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW)
> -  || capng_update(CAPNG_ADD, cap_sets, 
> CAP_NET_BROADCAST);
> +  || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST)
> +#ifdef DPDK_NETDEV
> +  || (access_hardware_ports &&
> +  capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO))
> +#else
> +;
> +if (access_hardware_ports) {
> +VLOG_WARN("hw port access requested, but no userspace 
> ioport support.  

[ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges

2023-02-22 Thread Aaron Conole
Open vSwitch generally tries to let the underlying operating system
managed the low level details of hardware, for example DMA mapping,
bus arbitration, etc.  However, when using DPDK, the underlying
operating system yields control of many of these details to userspace
for management.

In the case of some DPDK port drivers, configuring rte_flow or even
allocating resources may require access to iopl/ioperm calls, which
are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
calls are dangerous, and can allow a process to completely compromise
a system.  However, they are needed in the case of some userspace
driver code which manages the hardware (for example, the mlx
implementation of backend support for rte_flow).

Here, we create an opt-in flag passed to the command line to allow
this access.  We need to do this before ever accessing the database,
because we want to drop all privileges asap, and cannot wait for
a connection to the database to be established and functional before
dropping.  There may be distribution specific ways to do capability
management as well (using for example, systemd), but they are not
as universal to the vswitchd as a flag.

Signed-off-by: Aaron Conole 
---
 NEWS   |  4 
 lib/daemon-unix.c  | 31 ++-
 lib/daemon.c   |  2 +-
 lib/daemon.h   |  4 ++--
 ovsdb/ovsdb-client.c   |  6 +++---
 ovsdb/ovsdb-server.c   |  4 ++--
 tests/test-netflow.c   |  2 +-
 tests/test-sflow.c |  2 +-
 tests/test-unixctl.c   |  2 +-
 utilities/ovs-ofctl.c  |  4 ++--
 utilities/ovs-testcontroller.c |  4 ++--
 vswitchd/ovs-vswitchd.8.in |  8 
 vswitchd/ovs-vswitchd.c| 11 ++-
 13 files changed, 59 insertions(+), 25 deletions(-)

diff --git a/NEWS b/NEWS
index 85b3496214..65f35dcdd5 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,10 @@ Post-v3.1.0
in order to create OVSDB sockets with access mode of 0770.
- QoS:
  * Added new configuration option 'jitter' for a linux-netem QoS type.
+   - DPDK:
+ * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started
+   with the --hw-rawio-access command line option.  This allows the
+   process extra privileges when mapping physical interconnect memory.
 
 
 v3.1.0 - 16 Feb 2023
diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
index 1a7ba427d7..8b895a48de 100644
--- a/lib/daemon-unix.c
+++ b/lib/daemon-unix.c
@@ -88,7 +88,8 @@ static bool switch_user = false;
 static uid_t uid;
 static gid_t gid;
 static char *user = NULL;
-static void daemon_become_new_user__(bool access_datapath);
+static void daemon_become_new_user__(bool access_datapath,
+ bool access_hardware_ports);
 
 static void check_already_running(void);
 static int lock_pidfile(FILE *, int command);
@@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid)
  * daemonize_complete()) or that it failed to start up (by exiting with a
  * nonzero exit code). */
 void
-daemonize_start(bool access_datapath)
+daemonize_start(bool access_datapath, bool access_hardware_ports)
 {
 assert_single_threaded();
 daemonize_fd = -1;
 
 if (switch_user) {
-daemon_become_new_user__(access_datapath);
+daemon_become_new_user__(access_datapath, access_hardware_ports);
 switch_user = false;
 }
 
@@ -807,7 +808,8 @@ daemon_become_new_user_unix(void)
 /* Linux specific implementation of daemon_become_new_user()
  * using libcap-ng.   */
 static void
-daemon_become_new_user_linux(bool access_datapath OVS_UNUSED)
+daemon_become_new_user_linux(bool access_datapath OVS_UNUSED,
+ bool access_hardware_ports OVS_UNUSED)
 {
 #if defined __linux__ &&  HAVE_LIBCAPNG
 int ret;
@@ -826,7 +828,17 @@ daemon_become_new_user_linux(bool access_datapath 
OVS_UNUSED)
 if (access_datapath && !ret) {
 ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN)
   || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW)
-  || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST);
+  || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST)
+#ifdef DPDK_NETDEV
+  || (access_hardware_ports &&
+  capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO))
+#else
+;
+if (access_hardware_ports) {
+VLOG_WARN("hw port access requested, but no userspace 
ioport support.  Dropping.");
+}
+#endif
+;
 }
 } else {
 ret = -1;
@@ -854,7 +866,7 @@ daemon_become_new_user_linux(bool access_datapath 
OVS_UNUSED)
 }
 
 static void
-daemon_become_new_user__(bool access_datapath)
+daemon_become_new_user__(bool access_datapath, bool access_hardware_ports)
 {
 /* If vlog file has been created, change its owner to the