Re: [PATCH v8] drivers/misc: sysgenid: add system generation id driver

2021-03-23 Thread Catangiu, Adrian Costin
Hi Greg,

After your previous reply on this thread we started considering to provide this 
interface and framework/functionality through a userspace service instead of a 
kernel interface.
The latest iteration on this evolving patch-set doesn’t have strong reasons for 
living in the kernel anymore - the only objectively strong advantage would be 
easier driving of ecosystem integration; but I am not sure that's a good enough 
reason to create a new kernel interface.

I am now looking into adding this through Systemd. Either as a pluggable 
service or maybe even a systemd builtin offering.

What are your thoughts on it?

Thanks,
Adrian.

On 23/03/2021, 14:57, "Greg KH"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Mon, Mar 08, 2021 at 05:03:58PM +0100, Alexander Graf wrote:
>
>
> On 08.03.21 15:36, Greg KH wrote:
> >
> > On Mon, Mar 08, 2021 at 04:18:03PM +0200, Adrian Catangiu wrote:
> > > +static struct miscdevice sysgenid_misc = {
> > > + .minor = MISC_DYNAMIC_MINOR,
> > > + .name = "sysgenid",
> > > + .fops = ,
> > > +};
> >
> > Much cleaner, but:
> >
> > > +static int __init sysgenid_init(void)
> > > +{
> > > + int ret;
> > > +
> > > + sysgenid_data.map_buf = get_zeroed_page(GFP_KERNEL);
> > > + if (!sysgenid_data.map_buf)
> > > + return -ENOMEM;
> > > +
> > > + atomic_set(_data.generation_counter, 0);
> > > + atomic_set(_data.outdated_watchers, 0);
> > > + init_waitqueue_head(_data.read_waitq);
> > > + init_waitqueue_head(_data.outdated_waitq);
> > > + spin_lock_init(_data.lock);
> > > +
> > > + ret = misc_register(_misc);
> > > + if (ret < 0) {
> > > + pr_err("misc_register() failed for sysgenid\n");
> > > + goto err;
> > > + }
> > > +
> > > + return 0;
> > > +
> > > +err:
> > > + free_pages(sysgenid_data.map_buf, 0);
> > > + sysgenid_data.map_buf = 0;
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static void __exit sysgenid_exit(void)
> > > +{
> > > + misc_deregister(_misc);
> > > + free_pages(sysgenid_data.map_buf, 0);
> > > + sysgenid_data.map_buf = 0;
> > > +}
> > > +
> > > +module_init(sysgenid_init);
> > > +module_exit(sysgenid_exit);
> >
> > So you do this for any bit of hardware that happens to be out there?
> > Will that really work?  You do not have any hwid to trigger off of to
> > know that this is a valid device you can handle?
>
> The interface is already useful in a pure container context where the
> generation change request is triggered by software.
>
> And yes, there are hardware triggers, but Michael was quite unhappy about
> potential races between VMGenID change and SysGenID change and thus wanted
> to ideally separate the interfaces. So we went ahead and isolated the
> SysGenID one, as it's already useful as is.
>
> Hardware drivers to inject change events into SysGenID can then follow
> later, for all different hardware platforms. But SysGenID as in this patch
> is a completely hardware agnostic concept.

Ok, this is going to play havoc with fuzzers and other "automated
testers", should be fun to watch!  :)

Let's queue this up and see what happens...

thanks,

greg k-h




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v7 0/2] System Generation ID driver and VMGENID backend

2021-03-04 Thread Catangiu, Adrian Costin
Hi Michael,

On 24/02/2021, 11:06, "Michael S. Tsirkin"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Wed, Feb 24, 2021 at 10:47:30AM +0200, Adrian Catangiu wrote:
> This feature is aimed at virtualized or containerized environments
> where VM or container snapshotting duplicates memory state, which is a
> challenge for applications that want to generate unique data such as
> request IDs, UUIDs, and cryptographic nonces.
>
> The patch set introduces a mechanism that provides a userspace
> interface for applications and libraries to be made aware of uniqueness
> breaking events such as VM or container snapshotting, and allow them to
> react and adapt to such events.
>
> Solving the uniqueness problem strongly enough for cryptographic
> purposes requires a mechanism which can deterministically reseed
> userspace PRNGs with new entropy at restore time. This mechanism must
> also support the high-throughput and low-latency use-cases that led
> programmers to pick a userspace PRNG in the first place; be usable by
> both application code and libraries; allow transparent retrofitting
> behind existing popular PRNG interfaces without changing application
> code; it must be efficient, especially on snapshot restore; and be
> simple enough for wide adoption.
>
> The first patch in the set implements a device driver which exposes a
> the /dev/sysgenid char device to userspace. Its associated filesystem
> operations operations can be used to build a system level safe workflow
> that guest software can follow to protect itself from negative system
> snapshot effects.
>
> The second patch in the set adds a VmGenId driver which makes use of
> the ACPI vmgenid device to drive SysGenId and to reseed kernel entropy
> following VM snapshots.
>
> **Please note**, SysGenID alone does not guarantee complete snapshot
> safety to applications using it. A certain workflow needs to be
> followed at the system level, in order to make the system
> snapshot-resilient. Please see the "Snapshot Safety Prerequisites"
> section in the included SysGenID documentation.
>
> ---
>
> v6 -> v7:
>   - remove sysgenid uevent

How about we drop mmap too?

There's simply no way I can see to make it safe, and
no implementation is worse than a racy one imho.

Yea there's some decumentation explaining how it is not
supposed to be used but it will *seem* to work for people
and we will be stuck trying to maintain it.

Let's see if userspace using this often enough to make the
system call

As Colm explained in his reply, the mmap is the only option to consume
this within the strict latency constraints of PRNGs and SSL libs, so what if
instead, we remove the IRQ race by removing vmgenid as an in-kernel
sysgenid backend/driver?

We could just drop the vmgenid driver for now and only drive sysgenid
from userspace using the fs interface. Doing so will remove the IRQ race
which comes from vmgenid backend, and will keep the SysGenID kernel
interface safe and consistent, with a race-free mmap().

What do you think?

> v5 -> v6:
>
>   - sysgenid: watcher tracking disabled by default
>   - sysgenid: add SYSGENID_SET_WATCHER_TRACKING ioctl to allow each
> file descriptor to set whether they should be tracked as watchers
>   - rename SYSGENID_FORCE_GEN_UPDATE -> SYSGENID_TRIGGER_GEN_UPDATE
>   - rework all documentation to clearly capture all prerequisites for
> achieving snapshot safety when using the provided mechanism
>   - sysgenid documentation: replace individual filesystem operations
> examples with a higher level example showcasing system-level
> snapshot-safe workflow
>
> v4 -> v5:
>
>   - sysgenid: generation changes are also exported through uevents
>   - remove SYSGENID_GET_OUTDATED_WATCHERS ioctl
>   - document sysgenid ioctl major/minor numbers
>
> v3 -> v4:
>
>   - split functionality in two separate kernel modules:
> 1. drivers/misc/sysgenid.c which provides the generic userspace
>interface and mechanisms
> 2. drivers/virt/vmgenid.c as VMGENID acpi device driver that seeds
>kernel entropy and acts as a driving backend for the generic
>sysgenid
>   - rename /dev/vmgenid -> /dev/sysgenid
>   - rename uapi header file vmgenid.h -> sysgenid.h
>   - rename ioctls VMGENID_* -> SYSGENID_*
>   - add ‘min_gen’ parameter to SYSGENID_FORCE_GEN_UPDATE ioctl
>   - fix races in documentation examples
>
> v2 -> v3:
>
>   - separate the core driver logic and interface, from the ACPI device.
> The ACPI vmgenid device is now one possible 

Re: [PATCH v5 1/2] drivers/misc: sysgenid: add system generation id driver

2021-02-09 Thread Catangiu, Adrian Costin
On 09/02/2021, 16:53, "Michael S. Tsirkin"  wrote:

On Mon, Feb 01, 2021 at 07:24:53PM +0200, Adrian Catangiu wrote:
> - Background and problem
>
> The System Generation ID feature is required in virtualized or
> containerized environments by applications that work with local copies
> or caches of world-unique data such as random values, uuids,
> monotonically increasing counters, etc.
> Such applications can be negatively affected by VM or container
> snapshotting when the VM or container is either cloned or returned to
> an earlier point in time.
>
> Furthermore, simply finding out about a system generation change is
> only the starting point of a process to renew internal states of
> possibly multiple applications across the system. This process could
> benefit from a driver that provides an interface through which
> orchestration can be easily done.
>
> - Solution
>
> The System Generation ID is a simple concept meant to alleviate the
> issue by providing a monotonically increasing u32 counter that changes
> each time the VM or container is restored from a snapshot.
>
> The `sysgenid` driver exposes a monotonic incremental System Generation
> u32 counter via a char-dev FS interface that provides sync and async
> SysGen counter updates notifications. It also provides SysGen counter
> retrieval and confirmation mechanisms.
>
> The counter starts from zero when the driver is initialized and
> monotonically increments every time the system generation changes.
>
> The `sysgenid` driver exports the `void sysgenid_bump_generation()`
> symbol which can be used by backend drivers to drive system generation
> changes based on hardware events.
> System generation changes can also be driven by userspace software
> through a dedicated driver ioctl.
>
> Userspace applications or libraries can then (a)synchronously consume
> the system generation counter through the provided FS interface to make
> any necessary internal adjustments following a system generation
> update.
>
> Signed-off-by: Adrian Catangiu 
> ---
>  Documentation/misc-devices/sysgenid.rst| 236 
>  Documentation/userspace-api/ioctl/ioctl-number.rst |   1 +
>  MAINTAINERS|   8 +
>  drivers/misc/Kconfig   |  16 ++
>  drivers/misc/Makefile  |   1 +
>  drivers/misc/sysgenid.c| 307 
+
>  include/uapi/linux/sysgenid.h  |  17 ++
>  7 files changed, 586 insertions(+)
>  create mode 100644 Documentation/misc-devices/sysgenid.rst
>  create mode 100644 drivers/misc/sysgenid.c
>  create mode 100644 include/uapi/linux/sysgenid.h
>
> diff --git a/Documentation/misc-devices/sysgenid.rst 
b/Documentation/misc-devices/sysgenid.rst
> new file mode 100644
> index 000..4337ca0
> --- /dev/null
> +++ b/Documentation/misc-devices/sysgenid.rst
> @@ -0,0 +1,236 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +
> +SYSGENID
> +
> +
> +The System Generation ID feature is required in virtualized or
> +containerized environments by applications that work with local copies
> +or caches of world-unique data such as random values, UUIDs,
> +monotonically increasing counters, etc.
> +Such applications can be negatively affected by VM or container
> +snapshotting when the VM or container is either cloned or returned to
> +an earlier point in time.
> +
> +The System Generation ID is a simple concept meant to alleviate the
> +issue by providing a monotonically increasing counter that changes
> +each time the VM or container is restored from a snapshot.
> +The driver for it lives at ``drivers/misc/sysgenid.c``.
> +
> +The ``sysgenid`` driver exposes a monotonic incremental System
> +Generation u32 counter via a char-dev FS interface accessible through
> +``/dev/sysgenid`` that provides sync and async SysGen counter update
> +notifications. It also provides SysGen counter retrieval and
> +confirmation mechanisms.
> +
> +The counter starts from zero when the driver is initialized and
> +monotonically increments every time the system generation changes.
> +
> +The ``sysgenid`` driver exports the ``void sysgenid_bump_generation()``
> +symbol which can be used by backend drivers to drive system generation
> +changes based on hardware events.
> +System generation changes can also be driven by userspace software
> +through a dedicated driver ioctl.
> +
> +Userspace applications or libraries can (a)synchronously consume the
> +system generation counter through the provided FS interface, to make
> 

Re: [PATCH v5 1/2] drivers/misc: sysgenid: add system generation id driver

2021-02-09 Thread Catangiu, Adrian Costin
On 02/02/2021, 14:05, "Greg KH"  wrote:

On Mon, Feb 01, 2021 at 07:24:53PM +0200, Adrian Catangiu wrote:
> +EXPORT_SYMBOL(sysgenid_bump_generation);

EXPORT_SYMBOL_GPL()?  I have to ask...

Good catch! Will update.




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v5 1/2] drivers/misc: sysgenid: add system generation id driver

2021-02-09 Thread Catangiu, Adrian Costin
On 02/02/2021, 14:09, "Greg KH"  wrote:

On Mon, Feb 01, 2021 at 07:24:53PM +0200, Adrian Catangiu wrote:
> +static long sysgenid_ioctl(struct file *file,
> +unsigned int cmd, 
unsigned long arg)

Very odd indentation style, checkpatch.pl didn't catch this?

Checkpatch.pl is happy with this, yes.
Will change it to a single tab nonetheless since it does look weird :)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v3] drivers/virt: vmgenid: add vm generation id driver

2021-01-10 Thread Catangiu, Adrian Costin
+ Eric W. Biederman

Eric's email was filtered by my server for some reason so I can't
directly reply to it, this is the closest thread relative I could answer on.

On 01/12/2020 12:00, Eric W. Biederman wrote:
>
>
> On 27.11.20 19:26, Catangiu, Adrian Costin wrote:
>> - Background
>>
>> The VM Generation ID is a feature defined by Microsoft (paper:
>> http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
>> multiple hypervisor vendors.
>>
>> The feature is required in virtualized environments by apps that work
>> with local copies/caches of world-unique data such as random values,
>> uuids, monotonically increasing counters, etc.
>> Such apps can be negatively affected by VM snapshotting when the VM
>> is either cloned or returned to an earlier point in time.
>>
> How does this differ from /proc/sys/kernel/random/boot_id?
The boot_id only changes at OS boot whereas we need the generation id to
change _while_ the system/guest-os is running - generation changes because
underlyingVM or container goes through a snapshot restore event which is
otherwisetransparent to guest system.
>>
>> The VM Generation ID is a simple concept meant to alleviate the issue
>> by providing a unique ID that changes each time the VM is restored
>> from a snapshot. The hw provided UUID value can be used to
>> differentiate between VMs or different generations of the same VM.
>>
> Does the VM generation ID change in a running that effectively things it
> is running?
Yes, the generation id changes while guest OS is running, the generation
change itself is what lets the guest OS and guest userspace know there's
been a VM or container snapshot restore event.
>>
>> - Problem
>>
>> The VM Generation ID is exposed through an ACPI device by multiple
>> hypervisor vendors but neither the vendors or upstream Linux have no
>> default driver for it leaving users to fend for themselves.
>>
>> Furthermore, simply finding out about a VM generation change is only
>> the starting point of a process to renew internal states of possibly
>> multiple applications across the system. This process could benefit
>> from a driver that provides an interface through which orchestration
>> can be easily done.
>>
>> - Solution
>>
>> This patch is a driver that exposes a monotonic incremental Virtual
>> Machine Generation u32 counter via a char-dev FS interface.
> Earlier it was a UUID now it is 32bit number?
The generation id exposed to userspace is a 32bit monotonic incremental
counter. This counter is internally driven by the acpi vmgenid device. The
128-bit vmgenid-device-provided UUID is only used internally by the driver.

I will make all of this clearer in the next patch version.

>> The FS
>> interface provides sync and async VmGen counter updates notifications.
>> It also provides VmGen counter retrieval and confirmation mechanisms.
>>
>> The generation counter and the interface through which it is exposed
>> are available even when there is no acpi device present.
>>
>> When the device is present, the hw provided UUID is not exposed to
>> userspace, it is internally used by the driver to keep accounting for
>> the exposed VmGen counter. The counter starts from zero when the
>> driver is initialized and monotonically increments every time the hw
>> UUID changes (the VM generation changes).
>> On each hw UUID change, the new hypervisor-provided UUID is also fed
>> to the kernel RNG.
> Should this be a hotplug even rather than a new character device?
>
> Without plugging into udev and the rest of the hotplug infrastructure
> I suspect things will be missed.
That's a good idea, I will look into it.
>>
>> If there is no acpi vmgenid device present, the generation changes are
>> not driven by hw vmgenid events but can be driven by software through
>> a dedicated driver ioctl.
>>
>> This patch builds on top of Or Idgar 's proposal
>> https://lkml.org/lkml/2018/3/1/498
> Eric
>



Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH v2] drivers/virt: vmgenid: add vm generation id driver

2020-11-27 Thread Catangiu, Adrian Costin
Sorry Jann for missing your original email.

On 27/11/2020 20:22, Jann Horn wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you can confirm the sender and know the 
> content is safe.
>
>
>
> [resend in the hope that amazon will accept my mail this time instead
> of replying "550 Too many invalid recipients" again]
>
> On Fri, Nov 20, 2020 at 11:29 PM Jann Horn  wrote:
>> On Mon, Nov 16, 2020 at 4:35 PM Catangiu, Adrian Costin
>>  wrote:
>>> This patch is a driver that exposes a monotonic incremental Virtual
>>> Machine Generation u32 counter via a char-dev FS interface that
>>> provides sync and async VmGen counter updates notifications. It also
>>> provides VmGen counter retrieval and confirmation mechanisms.
>>>
>>> The hw provided UUID is not exposed to userspace, it is internally
>>> used by the driver to keep accounting for the exposed VmGen counter.
>>> The counter starts from zero when the driver is initialized and
>>> monotonically increments every time the hw UUID changes (the VM
>>> generation changes).
>>>
>>> On each hw UUID change, the new hypervisor-provided UUID is also fed
>>> to the kernel RNG.
>> As for v1:
>>
>> Is there a reasonable usecase for the "confirmation" mechanism? It
>> doesn't seem very useful to me.

I think it adds value in complex scenarios with multiple users of the
mechanism, potentially at varying layers of the stack, different
processes and/or runtime libraries.

The driver offers a natural place to handle minimal orchestration
support and offer visibility in system-wide status.

A high-level service that trusts all system components to properly use
the confirmation mechanism can actually block and wait patiently for the
system to adjust to the new world. Even if it doesn't trust all
components it can still do a best-effort, timeout block.

>>
>> How do you envision integrating this with libraries that have to work
>> in restrictive seccomp sandboxes? If this was in the vDSO, that would
>> be much easier.

Since this mechanism targets all of userspace stack, the usecase greatly
vary. I doubt we can have a single silver bullet interface.

For example, the mmap interface targets user space RNGs, where as fast
and as race free as possible is key. But there also higher level
applications that don't manage their own memory or don't have access to
low-level primitives so they can't use the mmap or even vDSO interfaces.
That's what the rest of the logic is there for, the read+poll interface
and all of the orchestration logic.

Like you correctly point out, there are also scenarios like tight
seccomp jails where even the FS interfaces is inaccessible. For cases
like this and others, I believe we will have to work incrementally to
build up the interface diversity to cater to all the user scenarios
diversity.


Thanks,

Adrian.





Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


[PATCH v3] drivers/virt: vmgenid: add vm generation id driver

2020-11-27 Thread Catangiu, Adrian Costin
- Background

The VM Generation ID is a feature defined by Microsoft (paper:
http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
multiple hypervisor vendors.

The feature is required in virtualized environments by apps that work
with local copies/caches of world-unique data such as random values,
uuids, monotonically increasing counters, etc.
Such apps can be negatively affected by VM snapshotting when the VM
is either cloned or returned to an earlier point in time.

The VM Generation ID is a simple concept meant to alleviate the issue
by providing a unique ID that changes each time the VM is restored
from a snapshot. The hw provided UUID value can be used to
differentiate between VMs or different generations of the same VM.

- Problem

The VM Generation ID is exposed through an ACPI device by multiple
hypervisor vendors but neither the vendors or upstream Linux have no
default driver for it leaving users to fend for themselves.

Furthermore, simply finding out about a VM generation change is only
the starting point of a process to renew internal states of possibly
multiple applications across the system. This process could benefit
from a driver that provides an interface through which orchestration
can be easily done.

- Solution

This patch is a driver that exposes a monotonic incremental Virtual
Machine Generation u32 counter via a char-dev FS interface. The FS
interface provides sync and async VmGen counter updates notifications.
It also provides VmGen counter retrieval and confirmation mechanisms.

The generation counter and the interface through which it is exposed
are available even when there is no acpi device present.

When the device is present, the hw provided UUID is not exposed to
userspace, it is internally used by the driver to keep accounting for
the exposed VmGen counter. The counter starts from zero when the
driver is initialized and monotonically increments every time the hw
UUID changes (the VM generation changes).
On each hw UUID change, the new hypervisor-provided UUID is also fed
to the kernel RNG.

If there is no acpi vmgenid device present, the generation changes are
not driven by hw vmgenid events but can be driven by software through
a dedicated driver ioctl.

This patch builds on top of Or Idgar 's proposal
https://lkml.org/lkml/2018/3/1/498

- Future improvements

Ideally we would want the driver to register itself based on devices'
_CID and not _HID, but unfortunately I couldn't find a way to do that.
The problem is that ACPI device matching is done by
'__acpi_match_device()' which exclusively looks at
'acpi_hardware_id *hwid'.

There is a path for platform devices to match on _CID when _HID is
'PRP0001' - but this is not the case for the Qemu vmgenid device.

Guidance and help here would be greatly appreciated.

Signed-off-by: Adrian Catangiu 

---

v1 -> v2:

  - expose to userspace a monotonically increasing u32 Vm Gen Counter
    instead of the hw VmGen UUID
  - since the hw/hypervisor-provided 128-bit UUID is not public
    anymore, add it to the kernel RNG as device randomness
  - insert driver page containing Vm Gen Counter in the user vma in
    the driver's mmap handler instead of using a fault handler
  - turn driver into a misc device driver to auto-create /dev/vmgenid
  - change ioctl arg to avoid leaking kernel structs to userspace
  - update documentation
  - various nits
  - rebase on top of linus latest

v2 -> v3:

  - separate the core driver logic and interface, from the ACPI device.
    The ACPI vmgenid device is now one possible backend.
  - fix issue when timeout=0 in VMGENID_WAIT_WATCHERS
  - add locking to avoid races between fs ops handlers and hw irq
    driven generation updates
  - change VMGENID_WAIT_WATCHERS ioctl so if the current caller is
    outdated or a generation change happens while waiting (thus making
    current caller outdated), the ioctl returns -EINTR to signal the
    user to handle event and retry. Fixes blocking on oneself.
  - add VMGENID_FORCE_GEN_UPDATE ioctl conditioned by
    CAP_CHECKPOINT_RESTORE capability, through which software can force
    generation bump.
---
 Documentation/virt/vmgenid.rst | 240 +++
 drivers/virt/Kconfig   |  17 ++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 435
+
 include/uapi/linux/vmgenid.h   |  14 ++
 5 files changed, 707 insertions(+)
 create mode 100644 Documentation/virt/vmgenid.rst
 create mode 100644 drivers/virt/vmgenid.c
 create mode 100644 include/uapi/linux/vmgenid.h

diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst
new file mode 100644
index 000..b6a9f8d
--- /dev/null
+++ b/Documentation/virt/vmgenid.rst
@@ -0,0 +1,240 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+
+VMGENID
+
+
+The VM Generation ID is a feature defined by Microsoft (paper:
+http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
+multiple hypervisor 

Re: [PATCH v2] drivers/virt: vmgenid: add vm generation id driver

2020-11-27 Thread Catangiu, Adrian Costin

On 18/11/2020 12:30, Alexander Graf wrote:
>
>
> On 16.11.20 16:34, Catangiu, Adrian Costin wrote:
>> - Future improvements
>>
>> Ideally we would want the driver to register itself based on devices'
>> _CID and not _HID, but unfortunately I couldn't find a way to do that.
>> The problem is that ACPI device matching is done by
>> '__acpi_match_device()' which exclusively looks at
>> 'acpi_hardware_id *hwid'.
>>
>> There is a path for platform devices to match on _CID when _HID is
>> 'PRP0001' - but this is not the case for the Qemu vmgenid device.
>>
>> Guidance and help here would be greatly appreciated.
>
> That one is pretty important IMHO. How about the following (probably
> pretty mangled) patch? That seems to work for me. The ACPI change
> would obviously need to be its own stand alone change and needs proper
> assessment whether it could possibly break any existing systems.
>
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 1682f8b454a2..452443d79d87 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -748,7 +748,7 @@ static bool __acpi_match_device(struct acpi_device
> *device,
>  /* First, check the ACPI/PNP IDs provided by the caller. */
>  if (acpi_ids) {
>  for (id = acpi_ids; id->id[0] || id->cls; id++) {
> -    if (id->id[0] && !strcmp((char *)id->id, hwid->id))
> +    if (id->id[0] && !strncmp((char *)id->id, hwid->id,
> ACPI_ID_LEN - 1))
>  goto out_acpi_match;
>  if (id->cls && __acpi_match_device_cls(id, hwid))
>  goto out_acpi_match;
> diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
> index 75a787da8aad..0bfa422cf094 100644
> --- a/drivers/virt/vmgenid.c
> +++ b/drivers/virt/vmgenid.c
> @@ -356,7 +356,8 @@ static void vmgenid_acpi_notify(struct acpi_device
> *device, u32 event)
>  }
>
>  static const struct acpi_device_id vmgenid_ids[] = {
> -    {"QEMUVGID", 0},
> +    /* This really is VM_Gen_Counter, but we can only match 8
> characters */
> +    {"VM_GEN_C", 0},
>  {"", 0},
>  };
>

Looks legit. I can propose a patch with it, but how do we validate it
doesn't break any devices?


>> +2) ASYNC simplified example::
>> +
>> +    void handle_io_on_vmgenfd(int vmgenfd)
>> +    {
>> +    unsigned genid;
>> +
>> +    // because of VM generation change, we need to rebuild world
>> +    reseed_app_env();
>> +
>> +    // read new gen ID - we need it to confirm we've handled update
>> +    read(fd, , sizeof(genid));
>
> This is racy in case two consecutive snapshots happen. The read needs
> to go before the reseed.
>
Switched them around like you suggest to avoid confusion.

But I don't see a problem with this race. The idea here is to trigger
reseed_app_env() which doesn't depend on the generation counter value.
Whether it gets incremented once or N times is irrelevant, we're just
interested that we pause execution and reseed before resuming (in
between these, whether N or M generation changes is the same thing).

>> +3) Mapped memory polling simplified example::
>> +
>> +    /*
>> + * app/library function that provides cached secrets
>> + */
>> +    char * safe_cached_secret(app_data_t *app)
>> +    {
>> +    char *secret;
>> +    volatile unsigned *const genid_ptr = get_vmgenid_mapping(app);
>> +    again:
>> +    secret = __cached_secret(app);
>> +
>> +    if (unlikely(*genid_ptr != app->cached_genid)) {
>> +    // rebuild world then confirm the genid update (thru write)
>> +    rebuild_caches(app);
>> +
>> +    app->cached_genid = *genid_ptr;
>
> This is racy again. You need to read the genid before rebuild and set
> it here.
>
I don't see the race. Gen counter is read from volatile mapped mem, on
detected change we rebuild world, confirm the update back to the driver
then restart the loop. Loop will break when no more changes happen.

>> +    ack_vmgenid_update(app);
>> +
>> +    goto again;
>> +    }
>> +
>> +    return secret;
>> +    }
>> +

>> +
>> +static int vmgenid_close(struct inode *inode, struct file *file)
>> +{
>> +    struct file_data *file_data = file->private_data;
>> +    struct dev_data *priv = file_data->dev_data;
>> +
>> +    if (file_data->acked_gen_counter != priv->generation_counter)
>> +    vmgenid_put_outdated_watchers(priv);
>
> Is this racy?

[PATCH v2] drivers/virt: vmgenid: add vm generation id driver

2020-11-16 Thread Catangiu, Adrian Costin
- Background

The VM Generation ID is a feature defined by Microsoft (paper:
http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
multiple hypervisor vendors.

The feature is required in virtualized environments by apps that work
with local copies/caches of world-unique data such as random values,
uuids, monotonically increasing counters, etc.
Such apps can be negatively affected by VM snapshotting when the VM
is either cloned or returned to an earlier point in time.

The VM Generation ID is a simple concept meant to alleviate the issue
by providing a unique ID that changes each time the VM is restored
from a snapshot. The hw provided UUID value can be used to
differentiate between VMs or different generations of the same VM.

- Problem

The VM Generation ID is exposed through an ACPI device by multiple
hypervisor vendors but neither the vendors or upstream Linux have no
default driver for it leaving users to fend for themselves.

Furthermore, simply finding out about a VM generation change is only
the starting point of a process to renew internal states of possibly
multiple applications across the system. This process could benefit
from a driver that provides an interface through which orchestration
can be easily done.

- Solution

This patch is a driver that exposes a monotonic incremental Virtual
Machine Generation u32 counter via a char-dev FS interface that
provides sync and async VmGen counter updates notifications. It also
provides VmGen counter retrieval and confirmation mechanisms.

The hw provided UUID is not exposed to userspace, it is internally
used by the driver to keep accounting for the exposed VmGen counter.
The counter starts from zero when the driver is initialized and
monotonically increments every time the hw UUID changes (the VM
generation changes).

On each hw UUID change, the new hypervisor-provided UUID is also fed
to the kernel RNG.

This patch builds on top of Or Idgar 's proposal
https://lkml.org/lkml/2018/3/1/498

- Future improvements

Ideally we would want the driver to register itself based on devices'
_CID and not _HID, but unfortunately I couldn't find a way to do that.
The problem is that ACPI device matching is done by
'__acpi_match_device()' which exclusively looks at
'acpi_hardware_id *hwid'.

There is a path for platform devices to match on _CID when _HID is
'PRP0001' - but this is not the case for the Qemu vmgenid device.

Guidance and help here would be greatly appreciated.

- v1 -> v2:

  - expose to userspace a monotonically increasing u32 Vm Gen Counter
instead of the hw VmGen UUID
  - since the hw/hypervisor-provided 128-bit UUID is not public
anymore, add it to the kernel RNG as device randomness
  - insert driver page containing Vm Gen Counter in the user vma in
the driver's mmap handler instead of using a fault handler
  - turn driver into a misc device driver to auto-create /dev/vmgenid
  - change ioctl arg to avoid leaking kernel structs to userspace
  - update documentation
  - various nits (license, unnecessary casting, Kconfig, others)
  - rebase on top of linus latest

Signed-off-by: Adrian Catangiu 
---
 Documentation/virt/vmgenid.rst | 228 
 drivers/virt/Kconfig   |  17 ++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 390 +
 include/uapi/linux/vmgenid.h   |  13 ++
 5 files changed, 649 insertions(+)
 create mode 100644 Documentation/virt/vmgenid.rst
 create mode 100644 drivers/virt/vmgenid.c
 create mode 100644 include/uapi/linux/vmgenid.h

diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst
new file mode 100644
index 000..603e8a5
--- /dev/null
+++ b/Documentation/virt/vmgenid.rst
@@ -0,0 +1,228 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+
+VMGENID
+
+
+The VM Generation ID is a feature defined by Microsoft (paper:
+http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
+multiple hypervisor vendors.
+
+The feature is required in virtualized environments by apps that work
+with local copies/caches of world-unique data such as random values,
+uuids, monotonically increasing counters, etc.
+Such apps can be negatively affected by VM snapshotting when the VM
+is either cloned or returned to an earlier point in time.
+
+The VM Generation ID is a simple concept meant to alleviate the issue
+by providing a unique ID that changes each time the VM is restored
+from a snapshot. The hw provided UUID value can be used to
+differentiate between VMs or different generations of the same VM.
+
+The VM Generation ID is exposed through an ACPI device by multiple
+hypervisor vendors. The driver for it lives at
+``drivers/virt/vmgenid.c``
+
+The driver exposes a monotonic incremental Virtual Machine Generation
+u32 counter via a char-dev FS interface that provides sync and async
+VmGen counter updates notifications. It also provides VmGen counter
+retrieval and confirmation mechanisms.
+

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-20 Thread Catangiu, Adrian Costin
Hi all,

On 17/10/2020, 21:09, "Graf (AWS), Alexander"  wrote:

On 17.10.20 15:24, Jason A. Donenfeld wrote:
> 
> After discussing this offline with Jann a bit, I have a few general
> comments on the design of this.
> 
> First, the UUID communicated by the hypervisor should be consumed by
> the kernel -- added as another input to the rng -- and then userspace

We definitely want a kernel internal notifier as well, yes :).

What would be a generic event trigger mechanism we could use from a kernel
module/driver for triggering rng reseed (possibly adding the uuid to the mix
as well)?

  




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-17 Thread Catangiu, Adrian Costin
After discussing this offline with Jann a bit, I have a few general
comments on the design of this. 
First, the UUID communicated by the hypervisor should be consumed by
the kernel -- added as another input to the rng -- and then userspace
should be notified that it should reseed any userspace RNGs that it
may have, without actually communicating that UUID to userspace. IOW,
I agree with Jann there. Then, it's the functioning of this
notification mechanism to userspace that is interesting to me.

Agreed! The UUID/vmgenid is the glue to the hypervisor to be able to find 
out about VM snapshots/forks. The really interesting (and important) topic
here is finding the right notification mechanism to userspace.

...In retrospect, I should have posted this as RFC instead of PATCH.

So, anyway, here are a few options with some pros and cons for the
kernel notifying userspace that its RNG should reseed.
1. SIGRND - a new signal. Lol.
2. Userspace opens a file descriptor that it can epoll on. Pros are
that many notification mechanisms already use this. Cons is that this
requires syscall and might be more racy than we want. Another con is
that this a new thing for userspace programs to do.
3. We stick an atomic counter in the vDSO, Jann's suggestion. Pros are
that this is extremely fast, and also simple to use and implement.
There are enough sequence points in typical crypto programs that
checking to see whether this counter has changed before doing whatever
operation seems easy enough. Cons are that typically we've been
conservative about adding things to the vDSO, and this is also a new
thing for userspace programs to do.

For each 1, 2, and 3 options, userspace programs _have to do smth new_
anyway, so I wouldn't weigh that as a con.

An atomic counter in the vDSO looks like the most bang for the buck to me.
I'm really curious to hear more opinions on why we shouldn't do it.

4. We already have a mechanism for this kind of thing, because the
same issue comes up when fork()ing. The solution was MADV_WIPEONFORK,
where userspace marks a page to be zeroed when forking, for the
purposes of the RNG being notified when its world gets split in two.
This is basically the same thing as we're discussing here with guest
snapshots, except it's on the system level rather than the process
level, and a system has many processes. But the problem space is still
almost the same, and we could simply reuse that same mechanism. There
are a few implementation strategies for that:

I don't think we can piggy back on MADV_WIPEONFORK. That madvise flag
has a clear contract of only wiping _on fork_. Overloading it with wiping
on VM-fork - while process doesn't fork - might break some of its users.

4a. We mess with the PTEs of all processes' pages that are
MADV_WIPEONFORK, like fork does now, when the hypervisor notifies us
to do so. Then we wind up reusing the already existing logic for
userspace RNGs. Cons might be that this usually requires semaphores,
and we're in irq context, so we'd have to hoist to a workqueue, which
means either more wake up latency, or a larger race window.
4b. We just memzero all processes' pages that are MADV_WIPEONFORK,
when the hypervisor notifies us to do so. Then we wind up reusing the
already existing logic for userspace RNGs.
4c. The guest kernel maintains an array of physical addresses that are
MADV_WIPEONFORK. The hypervisor knows about this array and its
location through whatever protocol, and before resuming a
moved/snapshotted/duplicated VM, it takes the responsibility for
memzeroing this memory. The huge pro here would be that this
eliminates all races, and reduces complexity quite a bit, because the
hypervisor can perfectly synchronize its bringup (and SMP bringup)
with this, and it can even optimize things like on-disk memory
snapshots to simply not write out those pages to disk.

I've previously proposed a path similar (in concept at least) with a combination
of 4 a,b and c - 
https://lwn.net/ml/linux-mm/b7793b7a-3660-4769-9b9a-ffcf25072...@amazon.com/
without reusing MADV_WIPEONFORK, but by adding a dedicated
MADV_WIPEONSUSPEND.

That proposal was clunky however with many people raising concerns around
how the interface is too subtle and hard to work with.

A vmgenid driver offering a clean FS interface seemed cleaner, although, like
some of you observed, it still allows a window of time between actual VM fork
and userspace handling of the event.

One other direction that I would like to explore and I feel it’s similar to 
your 4c
proposal is to do smth like:
"mm: extend memfd with ability to create 'secret' memory"
https://patchwork.kernel.org/project/linux-mm/patch/20200130162340.GA14232@rapoport-lnx/

Maybe we can combine ideas from the two patches in smth like: instead of libs
using anon 

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Catangiu, Adrian Costin
Sorry, I forgot to add a few people interested in this and the KVM ML to CC. 
Added them.

On 16/10/2020, 17:33, "Catangiu, Adrian Costin"  wrote:

- Background

The VM Generation ID is a feature defined by Microsoft (paper:
http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
multiple hypervisor vendors.

The feature is required in virtualized environments by apps that work
with local copies/caches of world-unique data such as random values,
uuids, monotonically increasing counters, etc.
Such apps can be negatively affected by VM snapshotting when the VM
is either cloned or returned to an earlier point in time.

The VM Generation ID is a simple concept meant to alleviate the issue
by providing a unique ID that changes each time the VM is restored
from a snapshot. The hw provided UUID value can be used to
differentiate between VMs or different generations of the same VM.

- Problem

The VM Generation ID is exposed through an ACPI device by multiple
hypervisor vendors but neither the vendors or upstream Linux have no
default driver for it leaving users to fend for themselves.

Furthermore, simply finding out about a VM generation change is only
the starting point of a process to renew internal states of possibly
multiple applications across the system. This process could benefit
from a driver that provides an interface through which orchestration
can be easily done.

- Solution

This patch is a driver which exposes the Virtual Machine Generation ID
via a char-dev FS interface that provides ID update sync and async
notification, retrieval and confirmation mechanisms:

When the device is 'open()'ed a copy of the current vm UUID is
associated with the file handle. 'read()' operations block until the
associated UUID is no longer up to date - until HW vm gen id changes -
at which point the new UUID is provided/returned. Nonblocking 'read()'
uses EWOULDBLOCK to signal that there is no _new_ UUID available.

'poll()' is implemented to allow polling for UUID updates. Such
updates result in 'EPOLLIN' events.

Subsequent read()s following a UUID update no longer block, but return
the updated UUID. The application needs to acknowledge the UUID update
by confirming it through a 'write()'.
Only on writing back to the driver the right/latest UUID, will the
driver mark this "watcher" as up to date and remove EPOLLIN status.

'mmap()' support allows mapping a single read-only shared page which
will always contain the latest UUID value at offset 0.

The driver also adds support for tracking count of open file handles
that haven't acknowledged an UUID update. This is exposed through
two IOCTLs:
 * VMGENID_GET_OUTDATED_WATCHERS: immediately returns the number of
   _outdated_ watchers - number of file handles that were open during
   a VM generation change, and which have not yet confirmed the new
   Vm-Gen-Id.
 * VMGENID_WAIT_WATCHERS: blocks until there are no more _outdated_
   watchers, or if a 'timeout' argument is provided, until the timeout
   expires.

This patch builds on top of Or Idgar 's proposal
https://lkml.org/lkml/2018/3/1/498

- Future improvements

Ideally we would want the driver to register itself based on devices'
_CID and not _HID, but unfortunately I couldn't find a way to do that.
The problem is that ACPI device matching is done by
'__acpi_match_device()' which exclusively looks at
'acpi_hardware_id *hwid'.

There is a path for platform devices to match on _CID when _HID is
'PRP0001' - which is not the case for the Qemu vmgenid device.

Guidance and help here would be greatly appreciated.

Signed-off-by: Adrian Catangiu 
---
 Documentation/virt/vmgenid.rst | 211 +
 drivers/virt/Kconfig   |  13 ++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 419 
+
 include/uapi/linux/vmgenid.h   |  22 +++
 5 files changed, 666 insertions(+)
 create mode 100644 Documentation/virt/vmgenid.rst
 create mode 100644 drivers/virt/vmgenid.c
 create mode 100644 include/uapi/linux/vmgenid.h

diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst
new file mode 100644
index 000..5224415
--- /dev/null
+++ b/Documentation/virt/vmgenid.rst
@@ -0,0 +1,211 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+
+VMGENID
+
+
+The VM Generation ID is a feature defined by Microsoft (paper:
+http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
+multiple hypervisor vendors.
+
+The feature is required in virtualized environmen

[PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Catangiu, Adrian Costin
- Background

The VM Generation ID is a feature defined by Microsoft (paper:
http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
multiple hypervisor vendors.

The feature is required in virtualized environments by apps that work
with local copies/caches of world-unique data such as random values,
uuids, monotonically increasing counters, etc.
Such apps can be negatively affected by VM snapshotting when the VM
is either cloned or returned to an earlier point in time.

The VM Generation ID is a simple concept meant to alleviate the issue
by providing a unique ID that changes each time the VM is restored
from a snapshot. The hw provided UUID value can be used to
differentiate between VMs or different generations of the same VM.

- Problem

The VM Generation ID is exposed through an ACPI device by multiple
hypervisor vendors but neither the vendors or upstream Linux have no
default driver for it leaving users to fend for themselves.

Furthermore, simply finding out about a VM generation change is only
the starting point of a process to renew internal states of possibly
multiple applications across the system. This process could benefit
from a driver that provides an interface through which orchestration
can be easily done.

- Solution

This patch is a driver which exposes the Virtual Machine Generation ID
via a char-dev FS interface that provides ID update sync and async
notification, retrieval and confirmation mechanisms:

When the device is 'open()'ed a copy of the current vm UUID is
associated with the file handle. 'read()' operations block until the
associated UUID is no longer up to date - until HW vm gen id changes -
at which point the new UUID is provided/returned. Nonblocking 'read()'
uses EWOULDBLOCK to signal that there is no _new_ UUID available.

'poll()' is implemented to allow polling for UUID updates. Such
updates result in 'EPOLLIN' events.

Subsequent read()s following a UUID update no longer block, but return
the updated UUID. The application needs to acknowledge the UUID update
by confirming it through a 'write()'.
Only on writing back to the driver the right/latest UUID, will the
driver mark this "watcher" as up to date and remove EPOLLIN status.

'mmap()' support allows mapping a single read-only shared page which
will always contain the latest UUID value at offset 0.

The driver also adds support for tracking count of open file handles
that haven't acknowledged an UUID update. This is exposed through
two IOCTLs:
 * VMGENID_GET_OUTDATED_WATCHERS: immediately returns the number of
   _outdated_ watchers - number of file handles that were open during
   a VM generation change, and which have not yet confirmed the new
   Vm-Gen-Id.
 * VMGENID_WAIT_WATCHERS: blocks until there are no more _outdated_
   watchers, or if a 'timeout' argument is provided, until the timeout
   expires.

This patch builds on top of Or Idgar 's proposal
https://lkml.org/lkml/2018/3/1/498

- Future improvements

Ideally we would want the driver to register itself based on devices'
_CID and not _HID, but unfortunately I couldn't find a way to do that.
The problem is that ACPI device matching is done by
'__acpi_match_device()' which exclusively looks at
'acpi_hardware_id *hwid'.

There is a path for platform devices to match on _CID when _HID is
'PRP0001' - which is not the case for the Qemu vmgenid device.

Guidance and help here would be greatly appreciated.

Signed-off-by: Adrian Catangiu 
---
 Documentation/virt/vmgenid.rst | 211 +
 drivers/virt/Kconfig   |  13 ++
 drivers/virt/Makefile  |   1 +
 drivers/virt/vmgenid.c | 419 +
 include/uapi/linux/vmgenid.h   |  22 +++
 5 files changed, 666 insertions(+)
 create mode 100644 Documentation/virt/vmgenid.rst
 create mode 100644 drivers/virt/vmgenid.c
 create mode 100644 include/uapi/linux/vmgenid.h

diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst
new file mode 100644
index 000..5224415
--- /dev/null
+++ b/Documentation/virt/vmgenid.rst
@@ -0,0 +1,211 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+
+VMGENID
+
+
+The VM Generation ID is a feature defined by Microsoft (paper:
+http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
+multiple hypervisor vendors.
+
+The feature is required in virtualized environments by apps that work
+with local copies/caches of world-unique data such as random values,
+uuids, monotonically increasing counters, etc.
+Such apps can be negatively affected by VM snapshotting when the VM
+is either cloned or returned to an earlier point in time.
+
+The VM Generation ID is a simple concept meant to alleviate the issue
+by providing a unique ID that changes each time the VM is restored
+from a snapshot. The hw provided UUID value can be used to
+differentiate between VMs or different generations of the same VM.
+
+The VM Generation ID is exposed through an ACPI device by multiple
+hypervisor