David Marchand <[email protected]> writes:

> On Tue, Mar 7, 2023 at 2:06 PM Aaron Conole <[email protected]> wrote:
>>
>> Open vSwitch generally tries to let the underlying operating system
>> managed the low level details of hardware, for example DMA mapping,
>> bus arbitration, etc.  However, when using DPDK, the underlying
>> operating system yields control of many of these details to userspace
>> for management.
>>
>> In the case of some DPDK port drivers, configuring rte_flow or even
>> allocating resources may require access to iopl/ioperm calls, which
>> are guarded by the CAP_SYS_RAWIO privilege on linux systems.  These
>> calls are dangerous, and can allow a process to completely compromise
>> a system.  However, they are needed in the case of some userspace
>> driver code which manages the hardware (for example, the mlx
>> implementation of backend support for rte_flow).
>>
>> Here, we create an opt-in flag passed to the command line to allow
>> this access.  We need to do this before ever accessing the database,
>> because we want to drop all privileges asap, and cannot wait for
>> a connection to the database to be established and functional before
>> dropping.  There may be distribution specific ways to do capability
>> management as well (using for example, systemd), but they are not
>> as universal to the vswitchd as a flag.
>>
>> Reviewed-by: Simon Horman <[email protected]>
>> Signed-off-by: Aaron Conole <[email protected]>
>> ---
>>  NEWS                           |  4 ++++
>>  lib/daemon-unix.c              | 29 +++++++++++++++++++++--------
>>  lib/daemon-windows.c           |  3 ++-
>>  lib/daemon.c                   |  2 +-
>>  lib/daemon.h                   |  4 ++--
>>  ovsdb/ovsdb-client.c           |  6 +++---
>>  ovsdb/ovsdb-server.c           |  4 ++--
>>  tests/test-netflow.c           |  2 +-
>>  tests/test-sflow.c             |  2 +-
>>  tests/test-unixctl.c           |  2 +-
>>  utilities/ovs-ofctl.c          |  4 ++--
>>  utilities/ovs-testcontroller.c |  4 ++--
>>  vswitchd/ovs-vswitchd.8.in     |  8 ++++++++
>>  vswitchd/ovs-vswitchd.c        | 11 ++++++++++-
>>  14 files changed, 60 insertions(+), 25 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 85b3496214..65f35dcdd5 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -10,6 +10,10 @@ Post-v3.1.0
>>         in order to create OVSDB sockets with access mode of 0770.
>>     - QoS:
>>       * Added new configuration option 'jitter' for a linux-netem QoS type.
>> +   - DPDK:
>> +     * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started
>> +       with the --hw-rawio-access command line option.  This allows the
>> +       process extra privileges when mapping physical interconnect memory.
>>
>>
>>  v3.1.0 - 16 Feb 2023
>> diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
>> index 1a7ba427d7..a080facddc 100644
>> --- a/lib/daemon-unix.c
>> +++ b/lib/daemon-unix.c
>> @@ -88,7 +88,8 @@ static bool switch_user = false;
>>  static uid_t uid;
>>  static gid_t gid;
>>  static char *user = NULL;
>> -static void daemon_become_new_user__(bool access_datapath);
>> +static void daemon_become_new_user__(bool access_datapath,
>> +                                     bool access_hardware_ports);
>>
>>  static void check_already_running(void);
>>  static int lock_pidfile(FILE *, int command);
>> @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid)
>>   * daemonize_complete()) or that it failed to start up (by exiting with a
>>   * nonzero exit code). */
>>  void
>> -daemonize_start(bool access_datapath)
>> +daemonize_start(bool access_datapath, bool access_hardware_ports)
>>  {
>>      assert_single_threaded();
>>      daemonize_fd = -1;
>>
>>      if (switch_user) {
>> -        daemon_become_new_user__(access_datapath);
>> +        daemon_become_new_user__(access_datapath, access_hardware_ports);
>>          switch_user = false;
>>      }
>>
>> @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void)
>>  /* Linux specific implementation of daemon_become_new_user()
>>   * using libcap-ng.   */
>>  static void
>> -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED)
>> +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED,
>> +                             bool access_hardware_ports OVS_UNUSED)
>>  {
>>  #if defined __linux__ &&  HAVE_LIBCAPNG
>>      int ret;
>> @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath 
>> OVS_UNUSED)
>>                  ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN)
>>                        || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW)
>>                        || capng_update(CAPNG_ADD, cap_sets, 
>> CAP_NET_BROADCAST);
>> +#ifdef DPDK_NETDEV
>> +                if (access_hardware_ports && !ret) {
>> +                    ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO);
>> +                    VLOG_INFO("CAP_SYS_RAWIO enabled.");
>> +                }
>> +#else
>> +                if (access_hardware_ports) {
>> +                    VLOG_WARN("Dropped CAP_SYS_RAWIO request (no 
>> drivers).");
>
> Does it mean we drop the capability? or do we drop the drop?
> I don't really have a better wording but the current log had me
> reading it a few times.

I'll rewrite the log using Flavio's wording.

>> +                }
>> +#endif
>>              }
>>          } else {
>>              ret = -1;
>> @@ -854,7 +866,7 @@ daemon_become_new_user_linux(bool access_datapath 
>> OVS_UNUSED)
>>  }
>>
>>  static void
>> -daemon_become_new_user__(bool access_datapath)
>> +daemon_become_new_user__(bool access_datapath, bool access_hardware_ports)
>>  {
>>      /* If vlog file has been created, change its owner to the non-root user
>>       * as specifed by the --user option.  */
>> @@ -862,7 +874,8 @@ daemon_become_new_user__(bool access_datapath)
>>
>>      if (LINUX) {
>>          if (LIBCAPNG) {
>> -            daemon_become_new_user_linux(access_datapath);
>> +            daemon_become_new_user_linux(access_datapath,
>> +                                         access_hardware_ports);
>>          } else {
>>              VLOG_FATAL("%s: fail to downgrade user using libcap-ng. "
>>                         "(libcap-ng is not configured at compile time), "
>> @@ -877,11 +890,11 @@ daemon_become_new_user__(bool access_datapath)
>>   * However, there in case the user switch needs to be done
>>   * before daemonize_start(), the following API can be used.  */
>>  void
>> -daemon_become_new_user(bool access_datapath)
>> +daemon_become_new_user(bool access_datapath, bool access_hardware_ports)
>>  {
>>      assert_single_threaded();
>>      if (switch_user) {
>> -        daemon_become_new_user__(access_datapath);
>> +        daemon_become_new_user__(access_datapath, access_hardware_ports);
>>          /* daemonize_start() should not switch user again. */
>>          switch_user = false;
>>      }
>> diff --git a/lib/daemon-windows.c b/lib/daemon-windows.c
>> index 7e5f264f5b..f8055197eb 100644
>> --- a/lib/daemon-windows.c
>> +++ b/lib/daemon-windows.c
>> @@ -526,7 +526,8 @@ daemonize_complete(void)
>>  }
>>
>>  void
>> -daemon_become_new_user(bool access_datapath OVS_UNUSED)
>> +daemon_become_new_user(bool access_datapath OVS_UNUSED,
>> +                       bool access_hardware_ports OVS_UNUSED)
>>  {
>>  }
>>
>
> I think there is something missing for Windows.
> lib/daemon-windows.c:daemonize_start(bool access_datapath OVS_UNUSED)
>
> The CI will confirm this :-).

I wish it would give error instead of warning.  :)

It passed just fine, but yes, does give the following:

"declared formal parameter list different from definition" but not the
symbol name.  Thanks for catching it in review.

>> diff --git a/lib/daemon.c b/lib/daemon.c
>> index 3249c5ab4b..1e1c019eb1 100644
>> --- a/lib/daemon.c
>> +++ b/lib/daemon.c
>> @@ -48,7 +48,7 @@ get_detach(void)
>>  void
>>  daemonize(void)
>>  {
>> -    daemonize_start(false);
>> +    daemonize_start(false, false);
>>      daemonize_complete();
>>  }
>>
>> diff --git a/lib/daemon.h b/lib/daemon.h
>> index 0941574963..42372d1463 100644
>> --- a/lib/daemon.h
>> +++ b/lib/daemon.h
>> @@ -167,10 +167,10 @@ void set_detach(void);
>>  bool get_detach(void);
>>  void daemon_save_fd(int fd);
>>  void daemonize(void);
>> -void daemonize_start(bool access_datapath);
>> +void daemonize_start(bool access_datapath, bool access_hardware_ports);
>>  void daemonize_complete(void);
>>  void daemon_set_new_user(const char * user_spec);
>> -void daemon_become_new_user(bool access_datapath);
>> +void daemon_become_new_user(bool access_datapath, bool 
>> access_hardware_ports);
>>  void daemon_usage(void);
>>  void daemon_disable_self_confinement(void);
>>  bool daemon_should_self_confine(void);
>> diff --git a/ovsdb/ovsdb-client.c b/ovsdb/ovsdb-client.c
>> index f1b8d64910..bae2c5f041 100644
>> --- a/ovsdb/ovsdb-client.c
>> +++ b/ovsdb/ovsdb-client.c
>> @@ -250,7 +250,7 @@ main(int argc, char *argv[])
>>      parse_options(argc, argv);
>>      fatal_ignore_sigpipe();
>>
>> -    daemon_become_new_user(false);
>> +    daemon_become_new_user(false, false);
>>      if (optind >= argc) {
>>          ovs_fatal(0, "missing command name; use --help for help");
>>      }
>> @@ -1392,7 +1392,7 @@ do_monitor__(struct jsonrpc *rpc, const char *database,
>>
>>      daemon_save_fd(STDOUT_FILENO);
>>      daemon_save_fd(STDERR_FILENO);
>> -    daemonize_start(false);
>> +    daemonize_start(false, false);
>>      if (get_detach()) {
>>          int error;
>>
>> @@ -2276,7 +2276,7 @@ do_lock(struct jsonrpc *rpc, const char *method, const 
>> char *lock)
>>                                          getting a reply of the previous
>>                                          request. */
>>      daemon_save_fd(STDOUT_FILENO);
>> -    daemonize_start(false);
>> +    daemonize_start(false, false);
>>      lock_req_init(&lock_req, method, lock);
>>
>>      if (get_detach()) {
>> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
>> index 33ca4910d7..4fea2dbda7 100644
>> --- a/ovsdb/ovsdb-server.c
>> +++ b/ovsdb/ovsdb-server.c
>> @@ -341,7 +341,7 @@ main(int argc, char *argv[])
>>                    &run_command, &sync_from, &sync_exclude, &active);
>>      is_backup = sync_from && !active;
>>
>> -    daemon_become_new_user(false);
>> +    daemon_become_new_user(false, false);
>>
>>      /* Create and initialize 'config_tmpfile' as a temporary file to hold
>>       * ovsdb-server's most basic configuration, and then save our initial
>> @@ -359,7 +359,7 @@ main(int argc, char *argv[])
>>      save_config__(config_tmpfile, &remotes, &db_filenames, sync_from,
>>                    sync_exclude, is_backup);
>>
>> -    daemonize_start(false);
>> +    daemonize_start(false, false);
>>
>>      /* Load the saved config. */
>>      load_config(config_tmpfile, &remotes, &db_filenames, &sync_from,
>> diff --git a/tests/test-netflow.c b/tests/test-netflow.c
>> index d2322d4509..7f89cfcae0 100644
>> --- a/tests/test-netflow.c
>> +++ b/tests/test-netflow.c
>> @@ -195,7 +195,7 @@ test_netflow_main(int argc, char *argv[])
>>      }
>>
>>      daemon_save_fd(STDOUT_FILENO);
>> -    daemonize_start(false);
>> +    daemonize_start(false, false);
>>
>>      error = unixctl_server_create(NULL, &server);
>>      if (error) {
>> diff --git a/tests/test-sflow.c b/tests/test-sflow.c
>> index 460d4d6c54..3c617bdd16 100644
>> --- a/tests/test-sflow.c
>> +++ b/tests/test-sflow.c
>> @@ -709,7 +709,7 @@ test_sflow_main(int argc, char *argv[])
>>      }
>>
>>      daemon_save_fd(STDOUT_FILENO);
>> -    daemonize_start(false);
>> +    daemonize_start(false, false);
>>
>>      error = unixctl_server_create(NULL, &server);
>>      if (error) {
>> diff --git a/tests/test-unixctl.c b/tests/test-unixctl.c
>> index 3eadf54cd9..9e89827895 100644
>> --- a/tests/test-unixctl.c
>> +++ b/tests/test-unixctl.c
>> @@ -83,7 +83,7 @@ test_unixctl_main(int argc, char *argv[])
>>      fatal_ignore_sigpipe();
>>      parse_options(&argc, &argv, &unixctl_path);
>>
>> -    daemonize_start(false);
>> +    daemonize_start(false, false);
>>      int retval = unixctl_server_create(unixctl_path, &unixctl);
>>      if (retval) {
>>          exit(EXIT_FAILURE);
>> diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c
>> index eabec18a36..f81f5f759a 100644
>> --- a/utilities/ovs-ofctl.c
>> +++ b/utilities/ovs-ofctl.c
>> @@ -173,7 +173,7 @@ main(int argc, char *argv[])
>>      ctx.argc = argc - optind;
>>      ctx.argv = argv + optind;
>>
>> -    daemon_become_new_user(false);
>> +    daemon_become_new_user(false, false);
>>      if (read_only) {
>>          ovs_cmdl_run_command_read_only(&ctx, get_all_commands());
>>      } else {
>> @@ -2127,7 +2127,7 @@ monitor_vconn(struct vconn *vconn, bool 
>> reply_to_echo_requests,
>>      int error;
>>
>>      daemon_save_fd(STDERR_FILENO);
>> -    daemonize_start(false);
>> +    daemonize_start(false, false);
>>      error = unixctl_server_create(unixctl_path, &server);
>>      if (error) {
>>          ovs_fatal(error, "failed to create unixctl server");
>> diff --git a/utilities/ovs-testcontroller.c b/utilities/ovs-testcontroller.c
>> index b489ff5fc7..9f2fbfdf51 100644
>> --- a/utilities/ovs-testcontroller.c
>> +++ b/utilities/ovs-testcontroller.c
>> @@ -109,7 +109,7 @@ main(int argc, char *argv[])
>>      parse_options(argc, argv);
>>      fatal_ignore_sigpipe();
>>
>> -    daemon_become_new_user(false);
>> +    daemon_become_new_user(false, false);
>>
>>      if (argc - optind < 1) {
>>          ovs_fatal(0, "at least one vconn argument required; "
>> @@ -148,7 +148,7 @@ main(int argc, char *argv[])
>>          ovs_fatal(0, "no active or passive switch connections");
>>      }
>>
>> -    daemonize_start(false);
>> +    daemonize_start(false, false);
>>
>>      retval = unixctl_server_create(unixctl_path, &unixctl);
>>      if (retval) {
>> diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
>> index 9569265fcb..a6a4a24606 100644
>> --- a/vswitchd/ovs-vswitchd.8.in
>> +++ b/vswitchd/ovs-vswitchd.8.in
>> @@ -81,6 +81,14 @@ unavailable or unsuccessful.
>>  .SS "DPDK Options"
>>  For details on initializing \fBovs\-vswitchd\fR to use DPDK ports,
>>  refer to the documentation or \fBovs\-vswitchd.conf.db\fR(5).
>> +.SS "DPDK HW Access Options"
>> +.IP "\fB\-\-hw\-rawio\-access\fR"
>> +Tells \fBovs\-vswitchd\fR to retain the \fBCAP_SYS_RAWIO\fR capability,
>> +to allow userspace drivers access to raw hardware memory.  This will
>> +also allow the \fBovs\-vswitchd\fR daemon to call \fBiopl()\fR and
>> +\fBioperm()\fR functions to set port access.  This is a \fBvery\fR
>
> The main usecase for this OVS option would be mlx5 + full hwoffload in
> DPDK, so if giving an example in the manual, I'd rather mention it.
>
> On the other hand, iopl() is (was?) used in DPDK for x86 + virtio pmd
> with virtio pre v1.
> And here, I don't expect many people plugging OVS on top of this.
>
> Do you have another case in mind for iopl()/ioperm()?

No other case.  The main one I know of is mlx5 + full offload, and that
does call ioperm/iopl, iiuc.

> The rest lgtm.
>
> Thanks for working on this Aaron.
>
>> +powerful capability, so generally only enable as needed for specific
>> +hardware.
>>  .SS "Daemon Options"
>>  .ds DD \
>>  \fBovs\-vswitchd\fR detaches only after it has connected to the \

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to