On Tue, Mar 7, 2023 at 2:06 PM Aaron Conole <[email protected]> wrote:
>
> Open vSwitch generally tries to let the underlying operating system
> managed the low level details of hardware, for example DMA mapping,
> bus arbitration, etc. However, when using DPDK, the underlying
> operating system yields control of many of these details to userspace
> for management.
>
> In the case of some DPDK port drivers, configuring rte_flow or even
> allocating resources may require access to iopl/ioperm calls, which
> are guarded by the CAP_SYS_RAWIO privilege on linux systems. These
> calls are dangerous, and can allow a process to completely compromise
> a system. However, they are needed in the case of some userspace
> driver code which manages the hardware (for example, the mlx
> implementation of backend support for rte_flow).
>
> Here, we create an opt-in flag passed to the command line to allow
> this access. We need to do this before ever accessing the database,
> because we want to drop all privileges asap, and cannot wait for
> a connection to the database to be established and functional before
> dropping. There may be distribution specific ways to do capability
> management as well (using for example, systemd), but they are not
> as universal to the vswitchd as a flag.
>
> Reviewed-by: Simon Horman <[email protected]>
> Signed-off-by: Aaron Conole <[email protected]>
> ---
> NEWS | 4 ++++
> lib/daemon-unix.c | 29 +++++++++++++++++++++--------
> lib/daemon-windows.c | 3 ++-
> lib/daemon.c | 2 +-
> lib/daemon.h | 4 ++--
> ovsdb/ovsdb-client.c | 6 +++---
> ovsdb/ovsdb-server.c | 4 ++--
> tests/test-netflow.c | 2 +-
> tests/test-sflow.c | 2 +-
> tests/test-unixctl.c | 2 +-
> utilities/ovs-ofctl.c | 4 ++--
> utilities/ovs-testcontroller.c | 4 ++--
> vswitchd/ovs-vswitchd.8.in | 8 ++++++++
> vswitchd/ovs-vswitchd.c | 11 ++++++++++-
> 14 files changed, 60 insertions(+), 25 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 85b3496214..65f35dcdd5 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,6 +10,10 @@ Post-v3.1.0
> in order to create OVSDB sockets with access mode of 0770.
> - QoS:
> * Added new configuration option 'jitter' for a linux-netem QoS type.
> + - DPDK:
> + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started
> + with the --hw-rawio-access command line option. This allows the
> + process extra privileges when mapping physical interconnect memory.
>
>
> v3.1.0 - 16 Feb 2023
> diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
> index 1a7ba427d7..a080facddc 100644
> --- a/lib/daemon-unix.c
> +++ b/lib/daemon-unix.c
> @@ -88,7 +88,8 @@ static bool switch_user = false;
> static uid_t uid;
> static gid_t gid;
> static char *user = NULL;
> -static void daemon_become_new_user__(bool access_datapath);
> +static void daemon_become_new_user__(bool access_datapath,
> + bool access_hardware_ports);
>
> static void check_already_running(void);
> static int lock_pidfile(FILE *, int command);
> @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid)
> * daemonize_complete()) or that it failed to start up (by exiting with a
> * nonzero exit code). */
> void
> -daemonize_start(bool access_datapath)
> +daemonize_start(bool access_datapath, bool access_hardware_ports)
> {
> assert_single_threaded();
> daemonize_fd = -1;
>
> if (switch_user) {
> - daemon_become_new_user__(access_datapath);
> + daemon_become_new_user__(access_datapath, access_hardware_ports);
> switch_user = false;
> }
>
> @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void)
> /* Linux specific implementation of daemon_become_new_user()
> * using libcap-ng. */
> static void
> -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED)
> +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED,
> + bool access_hardware_ports OVS_UNUSED)
> {
> #if defined __linux__ && HAVE_LIBCAPNG
> int ret;
> @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath
> OVS_UNUSED)
> ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN)
> || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW)
> || capng_update(CAPNG_ADD, cap_sets,
> CAP_NET_BROADCAST);
> +#ifdef DPDK_NETDEV
> + if (access_hardware_ports && !ret) {
> + ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO);
> + VLOG_INFO("CAP_SYS_RAWIO enabled.");
> + }
> +#else
> + if (access_hardware_ports) {
> + VLOG_WARN("Dropped CAP_SYS_RAWIO request (no drivers).");
Does it mean we drop the capability? or do we drop the drop?
I don't really have a better wording but the current log had me
reading it a few times.
> + }
> +#endif
> }
> } else {
> ret = -1;
> @@ -854,7 +866,7 @@ daemon_become_new_user_linux(bool access_datapath
> OVS_UNUSED)
> }
>
> static void
> -daemon_become_new_user__(bool access_datapath)
> +daemon_become_new_user__(bool access_datapath, bool access_hardware_ports)
> {
> /* If vlog file has been created, change its owner to the non-root user
> * as specifed by the --user option. */
> @@ -862,7 +874,8 @@ daemon_become_new_user__(bool access_datapath)
>
> if (LINUX) {
> if (LIBCAPNG) {
> - daemon_become_new_user_linux(access_datapath);
> + daemon_become_new_user_linux(access_datapath,
> + access_hardware_ports);
> } else {
> VLOG_FATAL("%s: fail to downgrade user using libcap-ng. "
> "(libcap-ng is not configured at compile time), "
> @@ -877,11 +890,11 @@ daemon_become_new_user__(bool access_datapath)
> * However, there in case the user switch needs to be done
> * before daemonize_start(), the following API can be used. */
> void
> -daemon_become_new_user(bool access_datapath)
> +daemon_become_new_user(bool access_datapath, bool access_hardware_ports)
> {
> assert_single_threaded();
> if (switch_user) {
> - daemon_become_new_user__(access_datapath);
> + daemon_become_new_user__(access_datapath, access_hardware_ports);
> /* daemonize_start() should not switch user again. */
> switch_user = false;
> }
> diff --git a/lib/daemon-windows.c b/lib/daemon-windows.c
> index 7e5f264f5b..f8055197eb 100644
> --- a/lib/daemon-windows.c
> +++ b/lib/daemon-windows.c
> @@ -526,7 +526,8 @@ daemonize_complete(void)
> }
>
> void
> -daemon_become_new_user(bool access_datapath OVS_UNUSED)
> +daemon_become_new_user(bool access_datapath OVS_UNUSED,
> + bool access_hardware_ports OVS_UNUSED)
> {
> }
>
I think there is something missing for Windows.
lib/daemon-windows.c:daemonize_start(bool access_datapath OVS_UNUSED)
The CI will confirm this :-).
> diff --git a/lib/daemon.c b/lib/daemon.c
> index 3249c5ab4b..1e1c019eb1 100644
> --- a/lib/daemon.c
> +++ b/lib/daemon.c
> @@ -48,7 +48,7 @@ get_detach(void)
> void
> daemonize(void)
> {
> - daemonize_start(false);
> + daemonize_start(false, false);
> daemonize_complete();
> }
>
> diff --git a/lib/daemon.h b/lib/daemon.h
> index 0941574963..42372d1463 100644
> --- a/lib/daemon.h
> +++ b/lib/daemon.h
> @@ -167,10 +167,10 @@ void set_detach(void);
> bool get_detach(void);
> void daemon_save_fd(int fd);
> void daemonize(void);
> -void daemonize_start(bool access_datapath);
> +void daemonize_start(bool access_datapath, bool access_hardware_ports);
> void daemonize_complete(void);
> void daemon_set_new_user(const char * user_spec);
> -void daemon_become_new_user(bool access_datapath);
> +void daemon_become_new_user(bool access_datapath, bool
> access_hardware_ports);
> void daemon_usage(void);
> void daemon_disable_self_confinement(void);
> bool daemon_should_self_confine(void);
> diff --git a/ovsdb/ovsdb-client.c b/ovsdb/ovsdb-client.c
> index f1b8d64910..bae2c5f041 100644
> --- a/ovsdb/ovsdb-client.c
> +++ b/ovsdb/ovsdb-client.c
> @@ -250,7 +250,7 @@ main(int argc, char *argv[])
> parse_options(argc, argv);
> fatal_ignore_sigpipe();
>
> - daemon_become_new_user(false);
> + daemon_become_new_user(false, false);
> if (optind >= argc) {
> ovs_fatal(0, "missing command name; use --help for help");
> }
> @@ -1392,7 +1392,7 @@ do_monitor__(struct jsonrpc *rpc, const char *database,
>
> daemon_save_fd(STDOUT_FILENO);
> daemon_save_fd(STDERR_FILENO);
> - daemonize_start(false);
> + daemonize_start(false, false);
> if (get_detach()) {
> int error;
>
> @@ -2276,7 +2276,7 @@ do_lock(struct jsonrpc *rpc, const char *method, const
> char *lock)
> getting a reply of the previous
> request. */
> daemon_save_fd(STDOUT_FILENO);
> - daemonize_start(false);
> + daemonize_start(false, false);
> lock_req_init(&lock_req, method, lock);
>
> if (get_detach()) {
> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> index 33ca4910d7..4fea2dbda7 100644
> --- a/ovsdb/ovsdb-server.c
> +++ b/ovsdb/ovsdb-server.c
> @@ -341,7 +341,7 @@ main(int argc, char *argv[])
> &run_command, &sync_from, &sync_exclude, &active);
> is_backup = sync_from && !active;
>
> - daemon_become_new_user(false);
> + daemon_become_new_user(false, false);
>
> /* Create and initialize 'config_tmpfile' as a temporary file to hold
> * ovsdb-server's most basic configuration, and then save our initial
> @@ -359,7 +359,7 @@ main(int argc, char *argv[])
> save_config__(config_tmpfile, &remotes, &db_filenames, sync_from,
> sync_exclude, is_backup);
>
> - daemonize_start(false);
> + daemonize_start(false, false);
>
> /* Load the saved config. */
> load_config(config_tmpfile, &remotes, &db_filenames, &sync_from,
> diff --git a/tests/test-netflow.c b/tests/test-netflow.c
> index d2322d4509..7f89cfcae0 100644
> --- a/tests/test-netflow.c
> +++ b/tests/test-netflow.c
> @@ -195,7 +195,7 @@ test_netflow_main(int argc, char *argv[])
> }
>
> daemon_save_fd(STDOUT_FILENO);
> - daemonize_start(false);
> + daemonize_start(false, false);
>
> error = unixctl_server_create(NULL, &server);
> if (error) {
> diff --git a/tests/test-sflow.c b/tests/test-sflow.c
> index 460d4d6c54..3c617bdd16 100644
> --- a/tests/test-sflow.c
> +++ b/tests/test-sflow.c
> @@ -709,7 +709,7 @@ test_sflow_main(int argc, char *argv[])
> }
>
> daemon_save_fd(STDOUT_FILENO);
> - daemonize_start(false);
> + daemonize_start(false, false);
>
> error = unixctl_server_create(NULL, &server);
> if (error) {
> diff --git a/tests/test-unixctl.c b/tests/test-unixctl.c
> index 3eadf54cd9..9e89827895 100644
> --- a/tests/test-unixctl.c
> +++ b/tests/test-unixctl.c
> @@ -83,7 +83,7 @@ test_unixctl_main(int argc, char *argv[])
> fatal_ignore_sigpipe();
> parse_options(&argc, &argv, &unixctl_path);
>
> - daemonize_start(false);
> + daemonize_start(false, false);
> int retval = unixctl_server_create(unixctl_path, &unixctl);
> if (retval) {
> exit(EXIT_FAILURE);
> diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c
> index eabec18a36..f81f5f759a 100644
> --- a/utilities/ovs-ofctl.c
> +++ b/utilities/ovs-ofctl.c
> @@ -173,7 +173,7 @@ main(int argc, char *argv[])
> ctx.argc = argc - optind;
> ctx.argv = argv + optind;
>
> - daemon_become_new_user(false);
> + daemon_become_new_user(false, false);
> if (read_only) {
> ovs_cmdl_run_command_read_only(&ctx, get_all_commands());
> } else {
> @@ -2127,7 +2127,7 @@ monitor_vconn(struct vconn *vconn, bool
> reply_to_echo_requests,
> int error;
>
> daemon_save_fd(STDERR_FILENO);
> - daemonize_start(false);
> + daemonize_start(false, false);
> error = unixctl_server_create(unixctl_path, &server);
> if (error) {
> ovs_fatal(error, "failed to create unixctl server");
> diff --git a/utilities/ovs-testcontroller.c b/utilities/ovs-testcontroller.c
> index b489ff5fc7..9f2fbfdf51 100644
> --- a/utilities/ovs-testcontroller.c
> +++ b/utilities/ovs-testcontroller.c
> @@ -109,7 +109,7 @@ main(int argc, char *argv[])
> parse_options(argc, argv);
> fatal_ignore_sigpipe();
>
> - daemon_become_new_user(false);
> + daemon_become_new_user(false, false);
>
> if (argc - optind < 1) {
> ovs_fatal(0, "at least one vconn argument required; "
> @@ -148,7 +148,7 @@ main(int argc, char *argv[])
> ovs_fatal(0, "no active or passive switch connections");
> }
>
> - daemonize_start(false);
> + daemonize_start(false, false);
>
> retval = unixctl_server_create(unixctl_path, &unixctl);
> if (retval) {
> diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
> index 9569265fcb..a6a4a24606 100644
> --- a/vswitchd/ovs-vswitchd.8.in
> +++ b/vswitchd/ovs-vswitchd.8.in
> @@ -81,6 +81,14 @@ unavailable or unsuccessful.
> .SS "DPDK Options"
> For details on initializing \fBovs\-vswitchd\fR to use DPDK ports,
> refer to the documentation or \fBovs\-vswitchd.conf.db\fR(5).
> +.SS "DPDK HW Access Options"
> +.IP "\fB\-\-hw\-rawio\-access\fR"
> +Tells \fBovs\-vswitchd\fR to retain the \fBCAP_SYS_RAWIO\fR capability,
> +to allow userspace drivers access to raw hardware memory. This will
> +also allow the \fBovs\-vswitchd\fR daemon to call \fBiopl()\fR and
> +\fBioperm()\fR functions to set port access. This is a \fBvery\fR
The main usecase for this OVS option would be mlx5 + full hwoffload in
DPDK, so if giving an example in the manual, I'd rather mention it.
On the other hand, iopl() is (was?) used in DPDK for x86 + virtio pmd
with virtio pre v1.
And here, I don't expect many people plugging OVS on top of this.
Do you have another case in mind for iopl()/ioperm()?
The rest lgtm.
Thanks for working on this Aaron.
> +powerful capability, so generally only enable as needed for specific
> +hardware.
> .SS "Daemon Options"
> .ds DD \
> \fBovs\-vswitchd\fR detaches only after it has connected to the \
--
David Marchand
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev