[libvirt PATCH] logging: lockdown the systemd service configuration

2023-09-26 Thread Daniel P . Berrangé
The 'systemd-analyze security' command looks at the unit file
configuration and reports on any settings which increase the
attack surface for the daemon. Since most systemd units are
fairly minimalist, this is generally informing us about settings
that we never put any thought into using before.

In its current configuration it reports

  # systemd-analyze security virtlogd.service
  ...snip...
  → Overall exposure level for virtlogd.service: 9.6 UNSAFE 

which is pretty terrible as a score.

If we apply all of the recommendations that appear possible
without (knowingly) breaking functionality it reports:

  # systemd-analyze security virtlogd.service
  ...snip...
  → Overall exposure level for virtlogd.service: 2.2 OK 

which is a pretty decent improvement.

Some of the settings we would like to enable require a systemd
version that is newer than that available in our oldest distro
target - RHEL-8 at v239.

NB, RestrictSUIDSGID is technically newer than 239, but RHEL-8
backported it, and other distros we target have it by default.

Remaining recommendations are

✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)

  We block FOWNER/IPC_OWNER, but can't block the two DAC
  capabilities. Historically apps/users might point QEMU
  to log files in $HOME, pre-created with their own user
  ID.

✗ IPAddressDeny=

  Not required since RestrictAddressFamilies blocks IP
  usage. Ignoring this avoids the overhead of creating
  a traffic filter than will never be used.

✗ NoNewPrivileges=

  Highly desirable, but cannot enable it yet, because it
  will block the ability to transition to the virtlogd_t
  SELinux domain during execve. The SELinux policy needs
  fixing to permit this transition under NNP first.

✗ PrivateTmp=

  There is a decent chance people have VMs configured
  with a serial port logfile pointing at /tmp. We would
  cause a regression to use private /tmp for logging

✗ PrivateUsers=

  This would put virtlogd inside a user namespace where
  its root is in fact unprivileged. Same problem as the
  User= setting below

✗ ProcSubset=

  Libraries we link to might read certain non-PID related
  files from /proc

✗ ProtectClock=

  Requires v245

✗ ProtectHome=

  Same problem as PrivateTmp=. There's a decent chance
  that someone has a VM configured to write a logfile
  to /home

✗ ProtectHostname=

  Requires v241

✗ ProtectKernelLogs

  Requires v244

✗ ProtectProc

  Requires v247

✗ ProtectSystem=

  We only set it to 'full', as 'strict' is not viable for
  our required usage

✗ RootDirectory=/RootImage=

  We are not capable of running inside a custom chroot
  given needs to write log files to arbitrary places

✗ RestrictAddressFamilies=~AF_UNIX

  We need AF_UNIX to communicate with other libvirt daemons

✗ SystemCallFilter=~@resources

  We link to libvirt.so which links to libnuma.so which has
  a constructor that calls set_mempolicy. This is highly
  undesirable todo during a constructor.

✗ User=/DynamicUser=

  This is highly desirable, but we currently read/write
  logs as root, and directories we're told to write into
  could be anywhere. So using a non-root user would have
  a major risk of regressions for applications and also
  have upgrade implications

Signed-off-by: Daniel P. Berrangé 
---
 src/logging/virtlogd.service.in | 94 +
 1 file changed, 94 insertions(+)

diff --git a/src/logging/virtlogd.service.in b/src/logging/virtlogd.service.in
index 8e245ddb43..9e3838ff34 100644
--- a/src/logging/virtlogd.service.in
+++ b/src/logging/virtlogd.service.in
@@ -20,5 +20,99 @@ OOMScoreAdjust=-900
 # per systemd recommendations
 LimitNOFILE=1024:524288
 
+CapabilityBoundingSet=~CAP_AUDIT_CONTROL
+CapabilityBoundingSet=~CAP_AUDIT_READ
+CapabilityBoundingSet=~CAP_AUDIT_WRITE
+CapabilityBoundingSet=~CAP_BLOCK_SUSPEND
+CapabilityBoundingSet=~CAP_CHOWN
+# Mgmt app/user might have pre-created log files that we're
+# told to open and write to, or be storing them in otherwise
+# inaccessible locations like $HOME. So we need to ignore
+# DAC permission checks.
+#CapabilityBoundingSet=~CAP_DAC_OVERRIDE
+#CapabilityBoundingSet=~CAP_DAC_READ_SEARCH
+CapabilityBoundingSet=~CAP_FOWNER
+CapabilityBoundingSet=~CAP_FSETID
+CapabilityBoundingSet=~CAP_IPC_LOCK
+CapabilityBoundingSet=~CAP_IPC_OWNER
+CapabilityBoundingSet=~CAP_KILL
+CapabilityBoundingSet=~CAP_LEASE
+CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE
+CapabilityBoundingSet=~CAP_MAC_ADMIN
+CapabilityBoundingSet=~CAP_MAC_OVERRIDE
+CapabilityBoundingSet=~CAP_MKNOD
+CapabilityBoundingSet=~CAP_NET_ADMIN
+CapabilityBoundingSet=~CAP_NET_BIND_SERVICE
+CapabilityBoundingSet=~CAP_NET_BROADCAST
+CapabilityBoundingSet=~CAP_NET_RAW
+CapabilityBoundingSet=~CAP_SETFCAP
+CapabilityBoundingSet=~CAP_SETPCAP
+CapabilityBoundingSet=~CAP_SETGID
+CapabilityBoundingSet=~CAP_SETUID
+CapabilityBoundingSet=~CAP_SYSLOG
+CapabilityBoundingSet=~CAP_SYS_ADMIN
+CapabilityBoundingSet=~CAP_SYS_BOOT
+CapabilityBoundingSet=~CAP_SYS_CHROOT

Re: [libvirt PATCH 0/2] Add vdpablock and nbdkit to NEWS

2023-09-26 Thread Jonathon Jongsma

On 9/19/23 3:47 PM, Jonathon Jongsma wrote:



Jonathon Jongsma (2):
   news: document support for vdpa block devices
   news: document nbdkit support for network disks

  NEWS.rst | 18 ++
  1 file changed, 18 insertions(+)



ping



Re: [libvirt PATCH v2] util: fix success return for virProcessKillPainfullyDelay()

2023-09-26 Thread Ján Tomko

On a Monday in 2023, Jonathon Jongsma wrote:

virProcessKillPainfullyDelay() currently almost always returns 1 or -1,
even though the documentation indicates that it should return 0 if the
process was terminated gracefully. But the computation of the return
code is faulty and the only case that it currently returns 0 is when it
is called with the pid of a process that does not exist.

Since no callers ever even distinguish between the 0 and 1 response
codes, simply get rid of the distinction and return 0 for both cases.

Signed-off-by: Jonathon Jongsma 
---

Change in v2:
- just drop the distinction between 0 and 1 and always return 0.
  Suggested by Ján Tomko

src/util/virprocess.c | 7 +++
1 file changed, 3 insertions(+), 4 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature


Re: [PATCH] test: Fix testNodeGetFreePages

2023-09-26 Thread Ján Tomko

On a Tuesday in 2023, Martin Kletzander wrote:

The function is supposed to return the number of items filled into the
array and not zero.  Also change the initialization of the "randomness"
to be based on the startCell so that the values are different for each
cell even for separate calls.

Signed-off-by: Martin Kletzander 
---
src/test/test_driver.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature


Re: [PATCH] test: Fix testNodeGetFreePages

2023-09-26 Thread Michal Prívozník
On 9/26/23 14:20, Martin Kletzander wrote:
> The function is supposed to return the number of items filled into the
> array and not zero.  Also change the initialization of the "randomness"
> to be based on the startCell so that the values are different for each
> cell even for separate calls.
> 
> Signed-off-by: Martin Kletzander 
> ---
>  src/test/test_driver.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/test/test_driver.c b/src/test/test_driver.c
> index c962aa74786e..998d102ddc5a 100644
> --- a/src/test/test_driver.c
> +++ b/src/test/test_driver.c
> @@ -4503,7 +4503,7 @@ testNodeGetFreePages(virConnectPtr conn G_GNUC_UNUSED,
>   unsigned int flags)
>  {
>  size_t i = 0, j = 0;
> -int x = 6;
> +int x = startCell * 6;

So startCell is now used within the function and thus its G_GNUC_UNUSED
annotation should be dropped.

>  
>  virCheckFlags(0, -1);
>  
> @@ -4514,7 +4514,7 @@ testNodeGetFreePages(virConnectPtr conn G_GNUC_UNUSED,
>  }
>  }
>  
> -return 0;
> +return cellCount * npages;
>  }
>  
>  static int testDomainCreateWithFlags(virDomainPtr domain, unsigned int flags)

Reviewed-by: Michal Privoznik 

Michal



[PATCH] virsh: Account for return values in virNodeGetFreePages

2023-09-26 Thread Martin Kletzander
The function returns how many array items were filled in, but virsh
never checked for anything other than errors.  Just to make sure this
does not report invalid data, even though the only possibility would be
reporting 0 free pages, check the returned data so that possible errors
are detected.

Signed-off-by: Martin Kletzander 
---
 tools/virsh-host.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/tools/virsh-host.c b/tools/virsh-host.c
index 21aca5f6dc83..411648197895 100644
--- a/tools/virsh-host.c
+++ b/tools/virsh-host.c
@@ -328,6 +328,8 @@ cmdFreepages(vshControl *ctl, const vshCmd *cmd)
 bool cellno = vshCommandOptBool(cmd, "cellno");
 bool pagesz = vshCommandOptBool(cmd, "pagesize");
 virshControl *priv = ctl->privData;
+bool pagesize_missing = false;
+int rv = -1;
 
 VSH_EXCLUSIVE_OPTIONS_VAR(all, cellno);
 
@@ -407,16 +409,22 @@ cmdFreepages(vshControl *ctl, const vshCmd *cmd)
 goto cleanup;
 }
 
-if (virNodeGetFreePages(priv->conn, npages, pagesize,
-cell, 1, counts, 0) < 0)
+rv = virNodeGetFreePages(priv->conn, npages, pagesize,
+ cell, 1, counts, 0);
+if (rv < 0)
 goto cleanup;
 
+if (rv < npages) {
+pagesize_missing = true;
+vshError(ctl, _("Did not get all free page data for node 
%1$d"), cell);
+continue;
+}
+
 vshPrint(ctl, _("Node %1$d:\n"), cell);
 for (j = 0; j < npages; j++)
 vshPrint(ctl, "%uKiB: %lld\n", pagesize[j], counts[j]);
 vshPrint(ctl, "%c", '\n');
 }
-
 } else {
 if (!cellno) {
 vshError(ctl, "%s", _("missing cellno argument"));
@@ -443,14 +451,22 @@ cmdFreepages(vshControl *ctl, const vshCmd *cmd)
 
 counts = g_new0(unsigned long long, 1);
 
-if (virNodeGetFreePages(priv->conn, 1, pagesize,
-cell, 1, counts, 0) < 0)
+rv = virNodeGetFreePages(priv->conn, 1, pagesize,
+ cell, 1, counts, 0);
+if (rv < 0)
 goto cleanup;
 
+if (rv == 0) {
+vshError(ctl,
+ "Could not get count of free %uKiB pages, no data 
returned",
+ *pagesize);
+goto cleanup;
+}
+
 vshPrint(ctl, "%uKiB: %lld\n", *pagesize, counts[0]);
 }
 
-ret = true;
+ret = !pagesize_missing;
  cleanup:
 VIR_FREE(nodes);
 return ret;
-- 
2.42.0



Re: [PATCH] test: Fix testNodeGetFreePages

2023-09-26 Thread Martin Kletzander

On Tue, Sep 26, 2023 at 02:20:43PM +0200, Martin Kletzander wrote:

The function is supposed to return the number of items filled into the
array and not zero.  Also change the initialization of the "randomness"
to be based on the startCell so that the values are different for each
cell even for separate calls.



And for this to be really true consider the following also squashed in
(already done locally):

diff --git c/src/test/test_driver.c i/src/test/test_driver.c
index 998d102ddc5a..a352fcb7b070 100644
--- c/src/test/test_driver.c
+++ i/src/test/test_driver.c
@@ -4503,11 +4503,12 @@ testNodeGetFreePages(virConnectPtr conn G_GNUC_UNUSED,
  unsigned int flags)
 {
 size_t i = 0, j = 0;
-int x = startCell * 6;

 virCheckFlags(0, -1);

 for (i = 0; i < cellCount; i++) {
+int x = (startCell + i) * 6;
+
 for (j = 0; j < npages; j++) {
 x = x * 2 + 7;
 counts[(i * npages) +  j] = x;
--


Signed-off-by: Martin Kletzander 
---
src/test/test_driver.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/test/test_driver.c b/src/test/test_driver.c
index c962aa74786e..998d102ddc5a 100644
--- a/src/test/test_driver.c
+++ b/src/test/test_driver.c
@@ -4503,7 +4503,7 @@ testNodeGetFreePages(virConnectPtr conn G_GNUC_UNUSED,
 unsigned int flags)
{
size_t i = 0, j = 0;
-int x = 6;
+int x = startCell * 6;

virCheckFlags(0, -1);

@@ -4514,7 +4514,7 @@ testNodeGetFreePages(virConnectPtr conn G_GNUC_UNUSED,
}
}

-return 0;
+return cellCount * npages;
}

static int testDomainCreateWithFlags(virDomainPtr domain, unsigned int flags)
--
2.42.0



signature.asc
Description: PGP signature


Re: [libvirt PATCH 26/42] systemd: Switch virtchd to common templates

2023-09-26 Thread Andrea Bolognani
On Tue, Sep 26, 2023 at 01:14:33PM +0100, Daniel P. Berrangé wrote:
> On Tue, Sep 26, 2023 at 07:02:19AM -0500, Andrea Bolognani wrote:
> > I don't think it helps much with not storing additional data inside
> > the build system, unless we want to store the contents of the various
> > common snippets in separate files? Something like
> >
> >   common_service = fs.read('common_service.inc')
> >   unit_conf = configuration_data({
> > 'common_service' = common_service,
> >   })
> >
> > We'd have to fake fs.read() because it was introduced in 0.57 though.
> > And we'd have to run the contents of the common parts through
> > variable substitution anyway, because they will contain a bunch of
> > lines like
> >
> >   Also=@service@.socket
> >   Also=@service@-ro.socket
> >   Also=@service@-admin.socket
> >
> > I'm not sure the result would look much better, but I can give it a
> > try.
>
> Don't try to do any of this in meson.  We should just have a standalone
> python script that can combine the daemon specific unit file contents
> with the common unit file contents. eg
>
>   scripts/merge-unit-file.py \
>  src/qemu/virtqemud.service.in \
>  src/rpc/virtd.service.in \
>  build/src/virtqemud.service

It feels a bit silly to shell out to Python to perform what is
ultimately a bunch of variable substitutions, as if that wasn't part
of Meson's core feature set... But I'll give it a try and see how it
turns out.

Can you please take a look at the remaining patches in the meantime,
and provide feedback on the changes that are made to the various
services and sockets as part of them? Thanks in advance :)

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [libvirt PATCH v3 00/10] external snapshot revert fixes

2023-09-26 Thread Peter Krempa
On Mon, Sep 18, 2023 at 15:29:17 +0200, Pavel Hrdina wrote:
> This fixes reverting external snapshots to not error out in cases where
> it should work and makes it correctly load the memory state when
> reverting to snapshot of running VM.
> 
> This discards v2 completely and makes changes to v1:
> 
> - moves qemuSaveImageStartProcess to qemu_process as
>   qemuProcessStartWithMemoryState
> 
> - change it to use cookie from memory state file instead of
>   using cookie from snapshot xml
> 
> - comments improvements
> 
> - introduces new helpers to start and stop decompression process
> 
> - reintroduces  capability
> 
> Pavel Hrdina (10):
>   qemu_saveimage: extract starting process to qemuSaveImageStartProcess
>   qemu_saveimage: introduce helpers to decompress memory state file
>   qemu_saveimage: move qemuSaveImageStartProcess to qemu_process
>   qemuProcessStartWithMemoryState: allow setting reason for audit log
>   qemuProcessStartWithMemoryState: add snapshot argument
>   qemuProcessStartWithMemoryState: make it possible to use without data
>   qemu_snapshot: fix reverting external snapshot when not all disks are
> included
>   qemu_snapshot: correctly load the saved memory state file
>   capabilities: report full external snapshot support
>   NEWS: document support for reverting external snapshots

Reviewed-by: Peter Krempa 



Re: [libvirt PATCH v3 06/10] qemuProcessStartWithMemoryState: make it possible to use without data

2023-09-26 Thread Peter Krempa
On Mon, Sep 18, 2023 at 15:29:23 +0200, Pavel Hrdina wrote:
> When used with internal snapshots there is no memory state file so we
> have no data to load and decompression is not needed.
> 
> Signed-off-by: Pavel Hrdina 
> ---
>  src/qemu/qemu_process.c | 23 +--
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index c8430bf7b7..f96918073f 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -8165,7 +8165,7 @@ qemuProcessStart(virConnectPtr conn,
>   * @fd: FD pointer of memory state file
>   * @path: path to memory state file
>   * @snapshot: internal snapshot to load when starting QEMU process or NULL
> - * @data: data from memory state file
> + * @data: data from memory state file or NULL
>   * @asyncJob: type of asynchronous job
>   * @start_flags: flags to start QEMU process with
>   * @reason: audit log reason
> @@ -8175,7 +8175,8 @@ qemuProcessStart(virConnectPtr conn,
>   * is correctly decompressed so it can be loaded by QEMU process.
>   *
>   * When reverting to internal snapshot callers needs to pass @snapshot as 
> well
> - * correctly start QEMU process.
> + * correctly start QEMU process. In addition there is no memory state file so
> + * it's safe to pass NULL as @data.

This does not fully address my comment from previous patch. It says that
@data 'can' be NULL, not that it MUST be inull if @snapshot is non-null.



Re: [libvirt PATCH 35/42] systemd: Replace Requires with BindTo+After for sockets

2023-09-26 Thread Andrea Bolognani
On Tue, Sep 26, 2023 at 01:36:39PM +0100, Daniel P. Berrangé wrote:
> On Tue, Sep 26, 2023 at 04:09:17AM -0500, Andrea Bolognani wrote:
> > On Tue, Sep 26, 2023 at 09:44:52AM +0100, Daniel P. Berrangé wrote:
> > > On Mon, Sep 25, 2023 at 08:58:33PM +0200, Andrea Bolognani wrote:
> > > > This is the strongest relationship that can be declared between
> > > > two units, and causes the service to be terminated immediately
> > > > if any of its sockets disappear. This is the behavior we want.
> > >
> > > IIUC, this prevents running the service with /only/ the main
> > > socket, and ro/admin sockets disabled. Running without the
> > > ro socket in particular was something we wanted to allow to
> > > reduce exposure to unprivileged services (there have been
> > > a number of CVEs where the read-only socket was the way in)
> >
> > This doesn't work today either AFAICT, since the ro/admin sockets are
> > marked as Required by the various services.
>
> Doh, yes, I've confirmed. I'm sure it used to work, but we must have
> broken it at some point as we tweaked the deps countless times over
> to finese the setup.
>
> > If we want to support this configuration, then we need
> >
> >   # foo.service
> >   [Unit]
> >   BindsTo=foo.socket
> >   Wants=foo-ro.socket
> >   Wants=foo-admin.socket
> >   After=foo.socket
> >
> > In the default scenario, things will work just the same as they do
> > here, but it will also be possible to mask foo{-ro,-admin}.socket to
> > obtain the hardened setup you describe.
>
> Or we just decide to keep life simple, and if people want to harden
> things they can change permissions on the socket via a system unit
> override locally.

I don't think this is any more complicated than the version that uses
BindsTo/After for all sockets, and it shouldn't make things any worse
for people who stick with the defaults, so I don't mind trying to
integrate this requirement into v2.

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [libvirt PATCH 26/42] systemd: Switch virtchd to common templates

2023-09-26 Thread Daniel P . Berrangé
On Tue, Sep 26, 2023 at 08:12:43AM -0500, Andrea Bolognani wrote:
> On Tue, Sep 26, 2023 at 01:14:33PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Sep 26, 2023 at 07:02:19AM -0500, Andrea Bolognani wrote:
> > > I don't think it helps much with not storing additional data inside
> > > the build system, unless we want to store the contents of the various
> > > common snippets in separate files? Something like
> > >
> > >   common_service = fs.read('common_service.inc')
> > >   unit_conf = configuration_data({
> > > 'common_service' = common_service,
> > >   })
> > >
> > > We'd have to fake fs.read() because it was introduced in 0.57 though.
> > > And we'd have to run the contents of the common parts through
> > > variable substitution anyway, because they will contain a bunch of
> > > lines like
> > >
> > >   Also=@service@.socket
> > >   Also=@service@-ro.socket
> > >   Also=@service@-admin.socket
> > >
> > > I'm not sure the result would look much better, but I can give it a
> > > try.
> >
> > Don't try to do any of this in meson.  We should just have a standalone
> > python script that can combine the daemon specific unit file contents
> > with the common unit file contents. eg
> >
> >   scripts/merge-unit-file.py \
> >  src/qemu/virtqemud.service.in \
> >  src/rpc/virtd.service.in \
> >  build/src/virtqemud.service
> 
> It feels a bit silly to shell out to Python to perform what is
> ultimately a bunch of variable substitutions, as if that wasn't part
> of Meson's core feature set... But I'll give it a try and see how it
> turns out.

IMHO Meson's job is to control the build process, rather than to
actually be the build process. I think of this as "compiling" the
unit files and the python sript is our compiler, which meson is
to control.

> Can you please take a look at the remaining patches in the meantime,
> and provide feedback on the changes that are made to the various
> services and sockets as part of them? Thanks in advance :)


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 35/42] systemd: Replace Requires with BindTo+After for sockets

2023-09-26 Thread Daniel P . Berrangé
On Tue, Sep 26, 2023 at 04:09:17AM -0500, Andrea Bolognani wrote:
> On Tue, Sep 26, 2023 at 09:44:52AM +0100, Daniel P. Berrangé wrote:
> > On Mon, Sep 25, 2023 at 08:58:33PM +0200, Andrea Bolognani wrote:
> > > This is the strongest relationship that can be declared between
> > > two units, and causes the service to be terminated immediately
> > > if any of its sockets disappear. This is the behavior we want.
> >
> > IIUC, this prevents running the service with /only/ the main
> > socket, and ro/admin sockets disabled. Running without the
> > ro socket in particular was something we wanted to allow to
> > reduce exposure to unprivileged services (there have been
> > a number of CVEs where the read-only socket was the way in)
> 
> This doesn't work today either AFAICT, since the ro/admin sockets are
> marked as Required by the various services.

Doh, yes, I've confirmed. I'm sure it used to work, but we must have
broken it at some point as we tweaked the deps countless times over
to finese the setup.

> If we want to support this configuration, then we need
> 
>   # foo.service
>   [Unit]
>   BindsTo=foo.socket
>   Wants=foo-ro.socket
>   Wants=foo-admin.socket
>   After=foo.socket
> 
> In the default scenario, things will work just the same as they do
> here, but it will also be possible to mask foo{-ro,-admin}.socket to
> obtain the hardened setup you describe.

Or we just decide to keep life simple, and if people want to harden
things they can change permissions on the socket via a system unit
override locally.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH v3 05/10] qemuProcessStartWithMemoryState: add snapshot argument

2023-09-26 Thread Peter Krempa
On Mon, Sep 18, 2023 at 15:29:22 +0200, Pavel Hrdina wrote:
> When called from snapshot code we will need to pass snapshot object in
> order to make internal snapshots work correctly.
> 
> Signed-off-by: Pavel Hrdina 
> ---
>  src/qemu/qemu_process.c   | 9 -
>  src/qemu/qemu_process.h   | 1 +
>  src/qemu/qemu_saveimage.c | 2 +-
>  3 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index d414a40fd5..c8430bf7b7 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -8164,6 +8164,7 @@ qemuProcessStart(virConnectPtr conn,
>   * @vm: domain object
>   * @fd: FD pointer of memory state file
>   * @path: path to memory state file
> + * @snapshot: internal snapshot to load when starting QEMU process or NULL
>   * @data: data from memory state file
>   * @asyncJob: type of asynchronous job
>   * @start_flags: flags to start QEMU process with
> @@ -8173,6 +8174,11 @@ qemuProcessStart(virConnectPtr conn,
>   * Start VM with existing memory state. Make sure that the stored memory 
> state
>   * is correctly decompressed so it can be loaded by QEMU process.
>   *
> + * When reverting to internal snapshot callers needs to pass @snapshot as 
> well
> + * correctly start QEMU process.

It doesn't mention that callers must not pass the other state-related
parameters as both methods can't be used at once.

> + *
> + * When restoring VM from saved image @snapshot needs to be NULL.

This is only an implication (one way) that says that with a saveimage
@snapshot must be null, but not that it's also the other way around.



Re: [libvirt PATCH 26/42] systemd: Switch virtchd to common templates

2023-09-26 Thread Andrea Bolognani
On Tue, Sep 26, 2023 at 11:23:51AM +0100, Daniel P. Berrangé wrote:
> On Tue, Sep 26, 2023 at 11:09:44AM +0200, Pavel Hrdina wrote:
> > On Mon, Sep 25, 2023 at 08:58:24PM +0200, Andrea Bolognani wrote:
> > > +'service_unit_extra': [
> > > +  'Wants=systemd-machined.service',
> > > +  'After=systemd-machined.service',
> > > +  'After=remote-fs.target',
> > > +],
> > > +'service_service_extra': [
> > > +  'KillMode=process',
> > > +  '# Raise hard limits to match behaviour of systemd >= 240.',
> > > +  '# During startup, daemon will set soft limit to match hard limit',
> > > +  '# per systemd recommendations',
> > > +  'LimitNOFILE=1024:524288',
> > > +  '# The cgroups pids controller can limit the number of tasks 
> > > started by',
> > > +  '# the daemon, which can limit the number of domains for some 
> > > hypervisors.',
> > > +  '# A conservative default of 8 tasks per guest results in a 
> > > TasksMax of',
> > > +  '# 32k to support 4096 guests.',
> > > +  'TasksMax=32768',
> > > +  '# With cgroups v2 there is no devices controller anymore, we have 
> > > to use',
> > > +  '# eBPF to control access to devices.  In order to do that we 
> > > create a eBPF',
> > > +  '# hash MAP which locks memory.  The default map size for 64 
> > > devices together',
> > > +  '# with program takes 12k per guest.  After rounding up we will 
> > > get 64M to',
> > > +  '# support 4096 guests.',
> > > +  'LimitMEMLOCK=64M',
> > > +],
> >
> > This feels wrong to have it in meson.build file. In addition it is the
> > same as for virtlxcd and virtqemud so we are basically duplicating the
> > data and which makes it easy to make inconsistent changes not affecting
> > all places.
> >
> > IMHO it would be better to have additional file that will be included
> > into the template for services where we need it.
> >
> > I'm not sure about the `service_unit_extra` as well if we want to have
> > it in meson.build files as it is not strictly related to the build
> > process and there is more data compared to the old `deps`.
>
> If anything I'd reverse the model.  The 'virtchd.service.in' file
> should be the primary template, the common bits the injected data.
>
> ie
>
>   cat virtchd.service.in
>   [Unit]
>   Description=Virtualization Cloud-Hypervisor daemon
>   ::common-unit::
>   Wants=systemd-machined.service
>   After=remote-fs.target
>   After=systemd-machined.service
>   Documentation=man:virtchd(8)
>
>
>   [Service]
>   ::common-service::
>   KillMode=process
>   # Raise hard limits to match behaviour of systemd >= 240.
>   # During startup, daemon will set soft limit to match hard limit
>   # per systemd recommendations
>   LimitNOFILE=1024:524288
>   # The cgroups pids controller can limit the number of tasks started by
>   # the daemon, which can limit the number of domains for some hypervisors.
>   # A conservative default of 8 tasks per guest results in a TasksMax of
>   # 32k to support 4096 guests.
>   TasksMax=32768
>   # With cgroups v2 there is no devices controller anymore, we have to use
>   # eBPF to control access to devices.  In order to do that we create a eBPF
>   # hash MAP which locks memory.  The default map size for 64 devices together
>   # with program takes 12k per guest.  After rounding up we will get 64M to
>   # support 4096 guests.
>   LimitMEMLOCK=64M
>
>   [Install]
>   ::common-install::

This doesn't address the problem with duplication that Pavel pointed
out.

I don't think it helps much with not storing additional data inside
the build system, unless we want to store the contents of the various
common snippets in separate files? Something like

  common_service = fs.read('common_service.inc')
  unit_conf = configuration_data({
'common_service' = common_service,
  })

We'd have to fake fs.read() because it was introduced in 0.57 though.
And we'd have to run the contents of the common parts through
variable substitution anyway, because they will contain a bunch of
lines like

  Also=@service@.socket
  Also=@service@-ro.socket
  Also=@service@-admin.socket

I'm not sure the result would look much better, but I can give it a
try.

> arguably we don't even need the '::common-XXX::' lines in there. We can
> simply see the headers [Unit], [Service], etc and inject the common
> bits under each header.

I think markers make things both easier to implement and more obvious
(whoever looks at the file can immediately tell that some sort of
post-processing is going to happen and probably even make a fairly
accurate guess as what it will entail), so I'd prefer having them.
But this is a fairly minor detail compared to the above.

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [libvirt PATCH 26/42] systemd: Switch virtchd to common templates

2023-09-26 Thread Daniel P . Berrangé
On Tue, Sep 26, 2023 at 07:02:19AM -0500, Andrea Bolognani wrote:
> On Tue, Sep 26, 2023 at 11:23:51AM +0100, Daniel P. Berrangé wrote:
> > On Tue, Sep 26, 2023 at 11:09:44AM +0200, Pavel Hrdina wrote:
> > > On Mon, Sep 25, 2023 at 08:58:24PM +0200, Andrea Bolognani wrote:
> > > > +'service_unit_extra': [
> > > > +  'Wants=systemd-machined.service',
> > > > +  'After=systemd-machined.service',
> > > > +  'After=remote-fs.target',
> > > > +],
> > > > +'service_service_extra': [
> > > > +  'KillMode=process',
> > > > +  '# Raise hard limits to match behaviour of systemd >= 240.',
> > > > +  '# During startup, daemon will set soft limit to match hard 
> > > > limit',
> > > > +  '# per systemd recommendations',
> > > > +  'LimitNOFILE=1024:524288',
> > > > +  '# The cgroups pids controller can limit the number of tasks 
> > > > started by',
> > > > +  '# the daemon, which can limit the number of domains for some 
> > > > hypervisors.',
> > > > +  '# A conservative default of 8 tasks per guest results in a 
> > > > TasksMax of',
> > > > +  '# 32k to support 4096 guests.',
> > > > +  'TasksMax=32768',
> > > > +  '# With cgroups v2 there is no devices controller anymore, we 
> > > > have to use',
> > > > +  '# eBPF to control access to devices.  In order to do that we 
> > > > create a eBPF',
> > > > +  '# hash MAP which locks memory.  The default map size for 64 
> > > > devices together',
> > > > +  '# with program takes 12k per guest.  After rounding up we will 
> > > > get 64M to',
> > > > +  '# support 4096 guests.',
> > > > +  'LimitMEMLOCK=64M',
> > > > +],
> > >
> > > This feels wrong to have it in meson.build file. In addition it is the
> > > same as for virtlxcd and virtqemud so we are basically duplicating the
> > > data and which makes it easy to make inconsistent changes not affecting
> > > all places.
> > >
> > > IMHO it would be better to have additional file that will be included
> > > into the template for services where we need it.
> > >
> > > I'm not sure about the `service_unit_extra` as well if we want to have
> > > it in meson.build files as it is not strictly related to the build
> > > process and there is more data compared to the old `deps`.
> >
> > If anything I'd reverse the model.  The 'virtchd.service.in' file
> > should be the primary template, the common bits the injected data.
> >
> > ie
> >
> >   cat virtchd.service.in
> >   [Unit]
> >   Description=Virtualization Cloud-Hypervisor daemon
> >   ::common-unit::
> >   Wants=systemd-machined.service
> >   After=remote-fs.target
> >   After=systemd-machined.service
> >   Documentation=man:virtchd(8)
> >
> >
> >   [Service]
> >   ::common-service::
> >   KillMode=process
> >   # Raise hard limits to match behaviour of systemd >= 240.
> >   # During startup, daemon will set soft limit to match hard limit
> >   # per systemd recommendations
> >   LimitNOFILE=1024:524288
> >   # The cgroups pids controller can limit the number of tasks started by
> >   # the daemon, which can limit the number of domains for some hypervisors.
> >   # A conservative default of 8 tasks per guest results in a TasksMax of
> >   # 32k to support 4096 guests.
> >   TasksMax=32768
> >   # With cgroups v2 there is no devices controller anymore, we have to use
> >   # eBPF to control access to devices.  In order to do that we create a eBPF
> >   # hash MAP which locks memory.  The default map size for 64 devices 
> > together
> >   # with program takes 12k per guest.  After rounding up we will get 64M to
> >   # support 4096 guests.
> >   LimitMEMLOCK=64M
> >
> >   [Install]
> >   ::common-install::
> 
> This doesn't address the problem with duplication that Pavel pointed
> out.
> 
> I don't think it helps much with not storing additional data inside
> the build system, unless we want to store the contents of the various
> common snippets in separate files? Something like
> 
>   common_service = fs.read('common_service.inc')
>   unit_conf = configuration_data({
> 'common_service' = common_service,
>   })
> 
> We'd have to fake fs.read() because it was introduced in 0.57 though.
> And we'd have to run the contents of the common parts through
> variable substitution anyway, because they will contain a bunch of
> lines like
> 
>   Also=@service@.socket
>   Also=@service@-ro.socket
>   Also=@service@-admin.socket
> 
> I'm not sure the result would look much better, but I can give it a
> try.

Don't try to do any of this in meson.  We should just have a standalone
python script that can combine the daemon specific unit file contents
with the common unit file contents. eg

  scripts/merge-unit-file.py \
 src/qemu/virtqemud.service.in \
 src/rpc/virtd.service.in \
 build/src/virtqemud.service

> > arguably we don't even need the '::common-XXX::' lines in there. We can
> > simply see the headers [Unit], [Service], etc and inject the common
> > bits under each header.

[PATCH] test: Fix testNodeGetFreePages

2023-09-26 Thread Martin Kletzander
The function is supposed to return the number of items filled into the
array and not zero.  Also change the initialization of the "randomness"
to be based on the startCell so that the values are different for each
cell even for separate calls.

Signed-off-by: Martin Kletzander 
---
 src/test/test_driver.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/test/test_driver.c b/src/test/test_driver.c
index c962aa74786e..998d102ddc5a 100644
--- a/src/test/test_driver.c
+++ b/src/test/test_driver.c
@@ -4503,7 +4503,7 @@ testNodeGetFreePages(virConnectPtr conn G_GNUC_UNUSED,
  unsigned int flags)
 {
 size_t i = 0, j = 0;
-int x = 6;
+int x = startCell * 6;
 
 virCheckFlags(0, -1);
 
@@ -4514,7 +4514,7 @@ testNodeGetFreePages(virConnectPtr conn G_GNUC_UNUSED,
 }
 }
 
-return 0;
+return cellCount * npages;
 }
 
 static int testDomainCreateWithFlags(virDomainPtr domain, unsigned int flags)
-- 
2.42.0



Re: [RFC PATCH libvirt v1 2/3] Improve `virsh start --console` behavior

2023-09-26 Thread Daniel P . Berrangé
On Tue, Sep 26, 2023 at 02:11:37PM +0200, Marc Hartmayer wrote:
> On Mon, Sep 25, 2023 at 04:15 PM +0100, Daniel P. Berrangé 
>  wrote:
> > On Mon, Sep 25, 2023 at 03:39:09PM +0200, Marc Hartmayer wrote:
> >> When starting a guest via libvirt (`virsh start --console`), early
> >> console output was missed because the guest was started first and then
> >> the console was attached. This patch changes this to the following
> >> sequence:
> >> 
> >> 1. create a paused guest
> >> 2. attach the console
> >> 3. resume the guest
> >> 
> >> Reviewed-by: Boris Fiuczynski 
> >> Signed-off-by: Marc Hartmayer 
> >> ---
> >>  tools/virsh-domain.c | 11 +--
> >>  1 file changed, 9 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
> >> index 5c3c6d18aebf..3581161c6f53 100644
> >> --- a/tools/virsh-domain.c
> >> +++ b/tools/virsh-domain.c
> >> @@ -4065,6 +4065,7 @@ cmdStart(vshControl *ctl, const vshCmd *cmd)
> >>  g_autoptr(virshDomain) dom = NULL;
> >>  #ifndef WIN32
> >>  bool console = vshCommandOptBool(cmd, "console");
> >> +bool resume_domain = false;
> >>  #endif
> >>  unsigned int flags = VIR_DOMAIN_NONE;
> >>  int rc;
> >> @@ -4083,8 +4084,14 @@ cmdStart(vshControl *ctl, const vshCmd *cmd)
> >>  if (virshFetchPassFdsList(ctl, cmd, , ) < 0)
> >>  return false;
> >>  
> >> -if (vshCommandOptBool(cmd, "paused"))
> >> +if (vshCommandOptBool(cmd, "paused")) {
> >>  flags |= VIR_DOMAIN_START_PAUSED;
> >> +#ifndef WIN32
> >> +} else if (console) {
> >> +flags |= VIR_DOMAIN_START_PAUSED;
> >> +resume_domain = true;
> >> +#endif
> >
> > Hypervisor drivers are not required to support VIR_DOMAIN_START_PAUSED.
> >
> > So we need to detect the error code, and retry without that flag
> > set as a fallback. Same in next patch.
> 
> Yep, makes sense - will change. Thanks for the feedback.

It'll be VIR_ERR_INVALID_ARG btw for an unsupported flag.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [RFC PATCH libvirt v1 2/3] Improve `virsh start --console` behavior

2023-09-26 Thread Marc Hartmayer
On Mon, Sep 25, 2023 at 04:15 PM +0100, Daniel P. Berrangé 
 wrote:
> On Mon, Sep 25, 2023 at 03:39:09PM +0200, Marc Hartmayer wrote:
>> When starting a guest via libvirt (`virsh start --console`), early
>> console output was missed because the guest was started first and then
>> the console was attached. This patch changes this to the following
>> sequence:
>> 
>> 1. create a paused guest
>> 2. attach the console
>> 3. resume the guest
>> 
>> Reviewed-by: Boris Fiuczynski 
>> Signed-off-by: Marc Hartmayer 
>> ---
>>  tools/virsh-domain.c | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>> 
>> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
>> index 5c3c6d18aebf..3581161c6f53 100644
>> --- a/tools/virsh-domain.c
>> +++ b/tools/virsh-domain.c
>> @@ -4065,6 +4065,7 @@ cmdStart(vshControl *ctl, const vshCmd *cmd)
>>  g_autoptr(virshDomain) dom = NULL;
>>  #ifndef WIN32
>>  bool console = vshCommandOptBool(cmd, "console");
>> +bool resume_domain = false;
>>  #endif
>>  unsigned int flags = VIR_DOMAIN_NONE;
>>  int rc;
>> @@ -4083,8 +4084,14 @@ cmdStart(vshControl *ctl, const vshCmd *cmd)
>>  if (virshFetchPassFdsList(ctl, cmd, , ) < 0)
>>  return false;
>>  
>> -if (vshCommandOptBool(cmd, "paused"))
>> +if (vshCommandOptBool(cmd, "paused")) {
>>  flags |= VIR_DOMAIN_START_PAUSED;
>> +#ifndef WIN32
>> +} else if (console) {
>> +flags |= VIR_DOMAIN_START_PAUSED;
>> +resume_domain = true;
>> +#endif
>
> Hypervisor drivers are not required to support VIR_DOMAIN_START_PAUSED.
>
> So we need to detect the error code, and retry without that flag
> set as a fallback. Same in next patch.

Yep, makes sense - will change. Thanks for the feedback.

[…snip]

-- 
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



Re: [libvirt PATCH 26/42] systemd: Switch virtchd to common templates

2023-09-26 Thread Daniel P . Berrangé
On Tue, Sep 26, 2023 at 11:09:44AM +0200, Pavel Hrdina wrote:
> On Mon, Sep 25, 2023 at 08:58:24PM +0200, Andrea Bolognani wrote:
> > Signed-off-by: Andrea Bolognani 
> > ---
> >  src/ch/meson.build| 27 
> >  src/ch/virtchd.service.in | 44 ---
> >  2 files changed, 23 insertions(+), 48 deletions(-)
> >  delete mode 100644 src/ch/virtchd.service.in
> > 
> > diff --git a/src/ch/meson.build b/src/ch/meson.build
> > index dc08069dcd..f6c443f3c6 100644
> > --- a/src/ch/meson.build
> > +++ b/src/ch/meson.build
> > @@ -57,11 +57,30 @@ if conf.has('WITH_CH')
> >  
> >virt_daemon_units += {
> >  'service': 'virtchd',
> > -'service_in': files('virtchd.service.in'),
> >  'name': 'Libvirt ch',
> > -'socket_in': libvirtd_socket_in,
> > -'socket_ro_in': libvirtd_socket_ro_in,
> > -'socket_admin_in': libvirtd_socket_admin_in,
> > +'service_unit_extra': [
> > +  'Wants=systemd-machined.service',
> > +  'After=systemd-machined.service',
> > +  'After=remote-fs.target',
> > +],
> > +'service_service_extra': [
> > +  'KillMode=process',
> > +  '# Raise hard limits to match behaviour of systemd >= 240.',
> > +  '# During startup, daemon will set soft limit to match hard limit',
> > +  '# per systemd recommendations',
> > +  'LimitNOFILE=1024:524288',
> > +  '# The cgroups pids controller can limit the number of tasks started 
> > by',
> > +  '# the daemon, which can limit the number of domains for some 
> > hypervisors.',
> > +  '# A conservative default of 8 tasks per guest results in a TasksMax 
> > of',
> > +  '# 32k to support 4096 guests.',
> > +  'TasksMax=32768',
> > +  '# With cgroups v2 there is no devices controller anymore, we have 
> > to use',
> > +  '# eBPF to control access to devices.  In order to do that we create 
> > a eBPF',
> > +  '# hash MAP which locks memory.  The default map size for 64 devices 
> > together',
> > +  '# with program takes 12k per guest.  After rounding up we will get 
> > 64M to',
> > +  '# support 4096 guests.',
> > +  'LimitMEMLOCK=64M',
> > +],
> 
> This feels wrong to have it in meson.build file. In addition it is the
> same as for virtlxcd and virtqemud so we are basically duplicating the
> data and which makes it easy to make inconsistent changes not affecting
> all places.
> 
> IMHO it would be better to have additional file that will be included
> into the template for services where we need it.
> 
> I'm not sure about the `service_unit_extra` as well if we want to have
> it in meson.build files as it is not strictly related to the build
> process and there is more data compared to the old `deps`.

If anything I'd reverse the model.  The 'virtchd.service.in' file
should be the primary template, the common bits the injected data.

ie

  cat virtchd.service.in
  [Unit]
  Description=Virtualization Cloud-Hypervisor daemon
  ::common-unit::
  Wants=systemd-machined.service
  After=remote-fs.target
  After=systemd-machined.service
  Documentation=man:virtchd(8)

  
  [Service]
  ::common-service::
  KillMode=process
  # Raise hard limits to match behaviour of systemd >= 240.
  # During startup, daemon will set soft limit to match hard limit
  # per systemd recommendations
  LimitNOFILE=1024:524288
  # The cgroups pids controller can limit the number of tasks started by
  # the daemon, which can limit the number of domains for some hypervisors.
  # A conservative default of 8 tasks per guest results in a TasksMax of
  # 32k to support 4096 guests.
  TasksMax=32768
  # With cgroups v2 there is no devices controller anymore, we have to use
  # eBPF to control access to devices.  In order to do that we create a eBPF
  # hash MAP which locks memory.  The default map size for 64 devices together
  # with program takes 12k per guest.  After rounding up we will get 64M to
  # support 4096 guests.
  LimitMEMLOCK=64M
  
  [Install]
  ::common-install::

arguably we don't even need the '::common-XXX::' lines in there. We can
simply see the headers [Unit], [Service], etc and inject the common
bits under each header.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Entering freeze for libvirt-9.8.0

2023-09-26 Thread Jiri Denemark
I have just tagged v9.8.0-rc1 in the repository and pushed signed
tarballs and source RPMs to https://download.libvirt.org/

Please give the release candidate some testing and in case you find a
serious issue which should have a fix in the upcoming release, feel
free to reply to this thread to make sure the issue is more visible.

If you have not done so yet, please update NEWS.rst to document any
significant change you made since the last release.

Thanks,

Jirka



Re: [libvirt PATCH 26/42] systemd: Switch virtchd to common templates

2023-09-26 Thread Andrea Bolognani
On Tue, Sep 26, 2023 at 11:09:44AM +0200, Pavel Hrdina wrote:
> On Mon, Sep 25, 2023 at 08:58:24PM +0200, Andrea Bolognani wrote:
> > +'service_unit_extra': [
> > +  'Wants=systemd-machined.service',
> > +  'After=systemd-machined.service',
> > +  'After=remote-fs.target',
> > +],
> > +'service_service_extra': [
> > +  'KillMode=process',
> > +  '# Raise hard limits to match behaviour of systemd >= 240.',
> > +  '# During startup, daemon will set soft limit to match hard limit',
> > +  '# per systemd recommendations',
> > +  'LimitNOFILE=1024:524288',
> > +  '# The cgroups pids controller can limit the number of tasks started 
> > by',
> > +  '# the daemon, which can limit the number of domains for some 
> > hypervisors.',
> > +  '# A conservative default of 8 tasks per guest results in a TasksMax 
> > of',
> > +  '# 32k to support 4096 guests.',
> > +  'TasksMax=32768',
> > +  '# With cgroups v2 there is no devices controller anymore, we have 
> > to use',
> > +  '# eBPF to control access to devices.  In order to do that we create 
> > a eBPF',
> > +  '# hash MAP which locks memory.  The default map size for 64 devices 
> > together',
> > +  '# with program takes 12k per guest.  After rounding up we will get 
> > 64M to',
> > +  '# support 4096 guests.',
> > +  'LimitMEMLOCK=64M',
> > +],
>
> This feels wrong to have it in meson.build file. In addition it is the
> same as for virtlxcd and virtqemud so we are basically duplicating the
> data and which makes it easy to make inconsistent changes not affecting
> all places.

You're right, it would make sense to deduplicate this further.

> IMHO it would be better to have additional file that will be included
> into the template for services where we need it.

Wouldn't a variable be enough?

In order to use a file, I can see two ways. First one is to have a
separate virtd-hypervisor.service.in that contains the same stuff as
virtd.service.in plus these comments, but that means introducing
duplication on a different axis and risking the two files going out
of sync. Second one is to have a virtd-comments.txt or whatever that
gets included conditionally from virtd.service.in, but that means
adding an extra processing step. Neither really feels an outright
improvement over what we have here.

Can you explain what did you have in mind? Maybe I'm just not seeing
it :)

> I'm not sure about the `service_unit_extra` as well if we want to have
> it in meson.build files as it is not strictly related to the build
> process and there is more data compared to the old `deps`.

That's because the various services and sockets have tiny differences
between them. Having a single template is IMO stictly better for
maintenability than carrying around more than a dozen copies of the
same basic information, which is what we have today.

It's true that this is going a bit overboard compared to what we're
using configuration data for elsewhere, but I don't think it's too
much of a stretch or something that feels too out of place.

That said, if you have an idea for an alternative approach to
achieving the same result, please do share it! I'm not married to
this specific implementation :)

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [libvirt PATCH 35/42] systemd: Replace Requires with BindTo+After for sockets

2023-09-26 Thread Andrea Bolognani
On Tue, Sep 26, 2023 at 09:44:52AM +0100, Daniel P. Berrangé wrote:
> On Mon, Sep 25, 2023 at 08:58:33PM +0200, Andrea Bolognani wrote:
> > This is the strongest relationship that can be declared between
> > two units, and causes the service to be terminated immediately
> > if any of its sockets disappear. This is the behavior we want.
>
> IIUC, this prevents running the service with /only/ the main
> socket, and ro/admin sockets disabled. Running without the
> ro socket in particular was something we wanted to allow to
> reduce exposure to unprivileged services (there have been
> a number of CVEs where the read-only socket was the way in)

This doesn't work today either AFAICT, since the ro/admin sockets are
marked as Required by the various services.

If we want to support this configuration, then we need

  # foo.service
  [Unit]
  BindsTo=foo.socket
  Wants=foo-ro.socket
  Wants=foo-admin.socket
  After=foo.socket

In the default scenario, things will work just the same as they do
here, but it will also be possible to mask foo{-ro,-admin}.socket to
obtain the hardened setup you describe.

-- 
Andrea Bolognani / Red Hat / Virtualization



Re: [libvirt PATCH 26/42] systemd: Switch virtchd to common templates

2023-09-26 Thread Pavel Hrdina
On Mon, Sep 25, 2023 at 08:58:24PM +0200, Andrea Bolognani wrote:
> Signed-off-by: Andrea Bolognani 
> ---
>  src/ch/meson.build| 27 
>  src/ch/virtchd.service.in | 44 ---
>  2 files changed, 23 insertions(+), 48 deletions(-)
>  delete mode 100644 src/ch/virtchd.service.in
> 
> diff --git a/src/ch/meson.build b/src/ch/meson.build
> index dc08069dcd..f6c443f3c6 100644
> --- a/src/ch/meson.build
> +++ b/src/ch/meson.build
> @@ -57,11 +57,30 @@ if conf.has('WITH_CH')
>  
>virt_daemon_units += {
>  'service': 'virtchd',
> -'service_in': files('virtchd.service.in'),
>  'name': 'Libvirt ch',
> -'socket_in': libvirtd_socket_in,
> -'socket_ro_in': libvirtd_socket_ro_in,
> -'socket_admin_in': libvirtd_socket_admin_in,
> +'service_unit_extra': [
> +  'Wants=systemd-machined.service',
> +  'After=systemd-machined.service',
> +  'After=remote-fs.target',
> +],
> +'service_service_extra': [
> +  'KillMode=process',
> +  '# Raise hard limits to match behaviour of systemd >= 240.',
> +  '# During startup, daemon will set soft limit to match hard limit',
> +  '# per systemd recommendations',
> +  'LimitNOFILE=1024:524288',
> +  '# The cgroups pids controller can limit the number of tasks started 
> by',
> +  '# the daemon, which can limit the number of domains for some 
> hypervisors.',
> +  '# A conservative default of 8 tasks per guest results in a TasksMax 
> of',
> +  '# 32k to support 4096 guests.',
> +  'TasksMax=32768',
> +  '# With cgroups v2 there is no devices controller anymore, we have to 
> use',
> +  '# eBPF to control access to devices.  In order to do that we create a 
> eBPF',
> +  '# hash MAP which locks memory.  The default map size for 64 devices 
> together',
> +  '# with program takes 12k per guest.  After rounding up we will get 
> 64M to',
> +  '# support 4096 guests.',
> +  'LimitMEMLOCK=64M',
> +],

This feels wrong to have it in meson.build file. In addition it is the
same as for virtlxcd and virtqemud so we are basically duplicating the
data and which makes it easy to make inconsistent changes not affecting
all places.

IMHO it would be better to have additional file that will be included
into the template for services where we need it.

I'm not sure about the `service_unit_extra` as well if we want to have
it in meson.build files as it is not strictly related to the build
process and there is more data compared to the old `deps`.

Pavel

>}
>  
>virt_install_dirs += [
> diff --git a/src/ch/virtchd.service.in b/src/ch/virtchd.service.in
> deleted file mode 100644
> index 351eee312b..00
> --- a/src/ch/virtchd.service.in
> +++ /dev/null
> @@ -1,44 +0,0 @@
> -[Unit]
> -Description=Virtualization Cloud-Hypervisor daemon
> -Conflicts=libvirtd.service
> -Requires=virtchd.socket
> -Requires=virtchd-ro.socket
> -Requires=virtchd-admin.socket
> -Wants=systemd-machined.service
> -After=network.target
> -After=dbus.service
> -After=apparmor.service
> -After=remote-fs.target
> -After=systemd-machined.service
> -Documentation=man:virtchd(8)
> -Documentation=https://libvirt.org
> -
> -[Service]
> -Type=notify
> -Environment=VIRTCHD_ARGS="--timeout 120"
> -EnvironmentFile=-@initconfdir@/virtchd
> -ExecStart=@sbindir@/virtchd $VIRTCHD_ARGS
> -ExecReload=/bin/kill -HUP $MAINPID
> -KillMode=process
> -Restart=on-failure
> -# Raise hard limits to match behaviour of systemd >= 240.
> -# During startup, daemon will set soft limit to match hard limit
> -# per systemd recommendations
> -LimitNOFILE=1024:524288
> -# The cgroups pids controller can limit the number of tasks started by
> -# the daemon, which can limit the number of domains for some hypervisors.
> -# A conservative default of 8 tasks per guest results in a TasksMax of
> -# 32k to support 4096 guests.
> -TasksMax=32768
> -# With cgroups v2 there is no devices controller anymore, we have to use
> -# eBPF to control access to devices.  In order to do that we create a eBPF
> -# hash MAP which locks memory.  The default map size for 64 devices together
> -# with program takes 12k per guest.  After rounding up we will get 64M to
> -# support 4096 guests.
> -LimitMEMLOCK=64M
> -
> -[Install]
> -WantedBy=multi-user.target
> -Also=virtchd.socket
> -Also=virtchd-ro.socket
> -Also=virtchd-admin.socket
> -- 
> 2.41.0
> 


signature.asc
Description: PGP signature


Re: [libvirt PATCH 12/42] systemd: Make @service_in@ optional

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:10PM +0200, Andrea Bolognani wrote:
> It is currently considered required, but we're soon going to
> provide a default that will be suitable for most services.
> 
> Since all services currently provide a value explicitly, we
> can implement a default without breaking anything.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/meson.build | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 35/42] systemd: Replace Requires with BindTo+After for sockets

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:33PM +0200, Andrea Bolognani wrote:
> This is the strongest relationship that can be declared between
> two units, and causes the service to be terminated immediately
> if any of its sockets disappear. This is the behavior we want.

IIUC, this prevents running the service with /only/ the main
socket, and ro/admin sockets disabled. Running without the
ro socket in particular was something we wanted to allow to
reduce exposure to unprivileged services (there have been
a number of CVEs where the read-only socket was the way in)

> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/locking/virtlockd.service.in | 6 --
>  src/logging/virtlogd.service.in  | 6 --
>  src/virtd.service.in | 9 ++---
>  3 files changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/src/locking/virtlockd.service.in 
> b/src/locking/virtlockd.service.in
> index 9e91fa3261..a21a2c2c19 100644
> --- a/src/locking/virtlockd.service.in
> +++ b/src/locking/virtlockd.service.in
> @@ -1,7 +1,9 @@
>  [Unit]
>  Description=Virtual machine lock manager
> -Requires=virtlockd.socket
> -Requires=virtlockd-admin.socket
> +BindsTo=virtlockd.socket
> +BindsTo=virtlockd-admin.socket
> +After=virtlockd.socket
> +After=virtlockd-admin.socket
>  Before=libvirtd.service
>  Documentation=man:virtlockd(8)
>  Documentation=https://libvirt.org
> diff --git a/src/logging/virtlogd.service.in b/src/logging/virtlogd.service.in
> index 97c942ffb0..f3bd576301 100644
> --- a/src/logging/virtlogd.service.in
> +++ b/src/logging/virtlogd.service.in
> @@ -1,7 +1,9 @@
>  [Unit]
>  Description=Virtual machine log manager
> -Requires=virtlogd.socket
> -Requires=virtlogd-admin.socket
> +BindsTo=virtlogd.socket
> +BindsTo=virtlogd-admin.socket
> +After=virtlogd.socket
> +After=virtlogd-admin.socket
>  Before=libvirtd.service
>  Documentation=man:virtlogd(8)
>  Documentation=https://libvirt.org
> diff --git a/src/virtd.service.in b/src/virtd.service.in
> index 21391a65b0..b9e6345e8c 100644
> --- a/src/virtd.service.in
> +++ b/src/virtd.service.in
> @@ -1,8 +1,11 @@
>  [Unit]
>  Description=@name@ daemon
> -Requires=@service@.socket
> -Requires=@service@-ro.socket
> -Requires=@service@-admin.socket
> +BindsTo=@service@.socket
> +BindsTo=@service@-ro.socket
> +BindsTo=@service@-admin.socket
> +After=@service@.socket
> +After=@service@-ro.socket
> +After=@service@-admin.socket
>  Conflicts=libvirtd.service
>  After=libvirtd.service
>  After=network.target
> -- 
> 2.41.0
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 09/42] systemd: Drop unnecessary uses of @sockets@

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:07PM +0200, Andrea Bolognani wrote:
> For most services, the value provided explicitly matches the
> documented default.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/ch/meson.build  | 1 -
>  src/interface/meson.build   | 1 -
>  src/libxl/meson.build   | 1 -
>  src/lxc/meson.build | 1 -
>  src/network/meson.build | 1 -
>  src/node_device/meson.build | 1 -
>  src/nwfilter/meson.build| 1 -
>  src/qemu/meson.build| 1 -
>  src/secret/meson.build  | 1 -
>  src/storage/meson.build | 1 -
>  src/vbox/meson.build| 1 -
>  src/vz/meson.build  | 1 -
>  12 files changed, 12 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 10/42] systemd: Make @sockprefix@ optional

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:08PM +0200, Andrea Bolognani wrote:
> For most services, the socket paths can be derived trivially from
> the name of the daemon: for virtqemud, for example, they will be
> 
>   /run/libvirt/virtqemud-sock
>   /run/libvirt/virtqemud-sock-ro
>   /run/libvirt/virtqemud-admin-sock
> 
> libvirtd and virtproxyd are the exceptions, since their socket
> paths will be
> 
>   /run/libvirt/libvirt-sock
>   /run/libvirt/libvirt-sock-ro
>   /run/libvirt/libvirt-admin-sock
> 
> So we still need to be able to provide a custom @sockprefix@ in
> those cases, but in the most common scenario we can do away with
> the requirement by introducing a sensible default.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/meson.build | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 11/42] systemd: Drop unnecessary uses of @sockprefix@

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:09PM +0200, Andrea Bolognani wrote:
> Now that providing the value is optional, we can remove almost
> all uses.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/ch/meson.build  | 1 -
>  src/interface/meson.build   | 1 -
>  src/libxl/meson.build   | 1 -
>  src/locking/meson.build | 1 -
>  src/logging/meson.build | 1 -
>  src/lxc/meson.build | 1 -
>  src/network/meson.build | 1 -
>  src/node_device/meson.build | 1 -
>  src/nwfilter/meson.build| 1 -
>  src/qemu/meson.build| 1 -
>  src/secret/meson.build  | 1 -
>  src/storage/meson.build | 1 -
>  src/vbox/meson.build| 1 -
>  src/vz/meson.build  | 1 -
>  14 files changed, 14 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 02/42] systemd: Add missing WantedBy for virtlogd/virtlockd

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:00PM +0200, Andrea Bolognani wrote:
> This annotation being missing resulted in virtlogd and virtlockd
> being marked as "indirect" services, i.e. services that cannot
> be started directly but have to be socket activated instead.
> 
> While this is our preferred configuration, we shouldn't prevent
> the admin to start them at boot if they want to.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/locking/virtlockd.service.in | 1 +
>  src/logging/virtlogd.service.in  | 1 +
>  2 files changed, 2 insertions(+)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 04/42] systemd: Set Type=notify for virtlogd/virtlockd

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:02PM +0200, Andrea Bolognani wrote:
> This tells systemd that the services in question support the
> native socket activation protocol.
> 
> virtlogd and virtlockd, just like all the other daemons, implement
> the necessary handshake.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/locking/virtlockd.service.in | 1 +
>  src/logging/virtlogd.service.in  | 1 +
>  2 files changed, 2 insertions(+)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 07/42] systemd: Rename @mode@ -> @sockmode@

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:05PM +0200, Andrea Bolognani wrote:
> Signed-off-by: Andrea Bolognani 
> ---
>  src/meson.build   | 6 +++---
>  src/remote/libvirtd.socket.in | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 08/42] systemd: Only set @sockmode@ once

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:06PM +0200, Andrea Bolognani wrote:
> The decision is based only on whether Polkit support is enabled,
> so there's no need to go through it again for every single
> service.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/meson.build | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 06/42] systemd: Rename socket_in_def -> socket_in_default

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:04PM +0200, Andrea Bolognani wrote:
> The meaning of the _def suffix might not be immediately obvious,
> especially since it's also used to refer to the output of the
> meson-gen-def.py script elsewhere in the same file.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/meson.build | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 05/42] systemd: Set @name@ for virtlogd/virtlockd

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:03PM +0200, Andrea Bolognani wrote:
> The information is not used anywhere right now, but the
> documentation for virt_daemon_units claims it's mandatory.
> 
> More importantly, we're going to start actually using it later
> on.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/locking/meson.build | 2 +-
>  src/logging/meson.build | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 01/42] systemd: Add missing Also for admin socket

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:57:59PM +0200, Andrea Bolognani wrote:
> When libvirtd, virtlog and virtlockd are enabled, we want their
> admin sockets to be enabled as well.

s/enabled/enabled for socket activation/

because these admin sockets were enabled automatically when the
service eventually starts already.

> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/locking/virtlockd.service.in | 1 +
>  src/logging/virtlogd.service.in  | 1 +
>  src/remote/libvirtd.service.in   | 1 +
>  3 files changed, 3 insertions(+)

Reviewed-by: Daniel P. Berrangé 

> 
> diff --git a/src/locking/virtlockd.service.in 
> b/src/locking/virtlockd.service.in
> index dd0bbab083..18873f86a6 100644
> --- a/src/locking/virtlockd.service.in
> +++ b/src/locking/virtlockd.service.in
> @@ -22,3 +22,4 @@ LimitNOFILE=1024:524288
>  
>  [Install]
>  Also=virtlockd.socket
> +Also=virtlockd-admin.socket
> diff --git a/src/logging/virtlogd.service.in b/src/logging/virtlogd.service.in
> index 8e245ddb43..14a991f348 100644
> --- a/src/logging/virtlogd.service.in
> +++ b/src/logging/virtlogd.service.in
> @@ -22,3 +22,4 @@ LimitNOFILE=1024:524288
>  
>  [Install]
>  Also=virtlogd.socket
> +Also=virtlogd-admin.socket
> diff --git a/src/remote/libvirtd.service.in b/src/remote/libvirtd.service.in
> index 84f1613adc..8839c00a15 100644
> --- a/src/remote/libvirtd.service.in
> +++ b/src/remote/libvirtd.service.in
> @@ -50,3 +50,4 @@ Also=virtlockd.socket
>  Also=virtlogd.socket
>  Also=libvirtd.socket
>  Also=libvirtd-ro.socket
> +Also=libvirtd-admin.socket
> -- 
> 2.41.0
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [libvirt PATCH 03/42] systemd: Add missing Service for virtlogd/virtlockd

2023-09-26 Thread Daniel P . Berrangé
On Mon, Sep 25, 2023 at 08:58:01PM +0200, Andrea Bolognani wrote:
> While systemd will automatically match foo.socket with foo.service
> based on their names, it's nicer to connect the two explicitly.
> 
> This is what we do for all services, with virtlogd and virtlockd
> being the only exceptions.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  src/locking/virtlockd.socket.in | 1 +
>  src/logging/virtlogd.socket.in  | 1 +
>  2 files changed, 2 insertions(+)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|