Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-20 Thread Christian Borntraeger



On 17.10.20 20:09, Alexander Graf wrote:
> Hi Jason,
> 
> On 17.10.20 15:24, Jason A. Donenfeld wrote:
>>
>> After discussing this offline with Jann a bit, I have a few general
>> comments on the design of this.
>>
>> First, the UUID communicated by the hypervisor should be consumed by
>> the kernel -- added as another input to the rng -- and then userspace
> 
> We definitely want a kernel internal notifier as well, yes :).
> 
>> should be notified that it should reseed any userspace RNGs that it
>> may have, without actually communicating that UUID to userspace. IOW,
> 
> I also tend to agree that it makes sense to disconnect the actual UUID we 
> receive from the notification to user space. This would allow us to create a 
> generic mechanism for VM save/restore cycles across different hypervisors. 
> Let me add PPC and s390x people to the CC list to see whether they have 
> anything remotely similar to the VmGenID mechanism. For x86 and aarch64, the 
> ACPI and memory based VmGenID implemented here is the most obvious option to 
> implement IMHO. It's also already implemented in all major hypervisors.

Hmm, what we do have configurations (e.g. stfle bits) and we do have a 
notification mechanism via sclp that notifies guests when things change.
As of today neither KVM nor Linux implement the sclp change notification 
mechanism, but I do see value in such a thing.

> 
>> I agree with Jann there. Then, it's the functioning of this
>> notification mechanism to userspace that is interesting to me.
> 
> Absolutely! Please have a look at the previous discussion here:
> 
> 
> https://lore.kernel.org/linux-pm/b7793b7a-3660-4769-9b9a-ffcf25072...@amazon.com/
> 
> The user space interface is absolutely what this is about.

Yes. Passing a notification to userspace is essential. Where I do not see a 
solution yet is the race between notification and
already running with the old knowledge.
> 
>> There are a few design goals of notifying userspace: it should be
>> fast, because people who are using userspace RNGs are usually doing so
>> in the first place to completely avoid syscall overhead for whatever
>> high performance application they have - e.g. I recall conversations
>> with Colm about his TLS implementation needing to make random IVs
>> _really_ fast. It should also happen as early as possible, with no
>> race or as minimal as possible race window, so that userspace doesn't
>> begin using old randomness and then switch over after the damage is
>> already done.
> 
> There are multiple facets and different types of consumers here. For a user 
> space RNG, I agree that fast and as race free as possible is key. That's what 
> the mmap interface is there for.
> 
> There are applications way beyond that though. What do you do with 
> applications that already consumed randomness? For example a cached pool of 
> SSL keys. Or a higher level language primitive that consumes randomness and 
> caches its seed somewhere in an internal data structure. Or even worse: your 
> system's host ssh key.
> 
> For those types of events, an mmap (or vDSO) interface does not work. We need 
> to actively allow user space applications to readjust to the new environment 
> - either internally (the language primitive case) or through a system event, 
> maybe even as systemd trigger (the ssh host key case).
> 
> To give everyone enough time before we consider a system as "updated to the 
> new environment", we have the callback logic with the "Orchestrator" that can 
> check whether all listeners to system wide updates confirms they adjusted 
> themselves.
> 
> That's what the rest of the logic is there for: A read+poll interface and all 
> of the orchestration logic. It's not for the user space RNG case, it's for 
> all of its downstream users.
> 
>> I'm also not wedded to using Microsoft's proprietary hypervisor design
>> for this. If we come up with a better interface, I don't think it's
>> asking too much to implement that and reasonably expect for Microsoft
>> to catch up. Maybe someone here will find that controversial, but
>> whatever -- discussing ideal designs does not seem out of place or
>> inappropriate for how we usually approach things in the kernel, and a
>> closed source hypervisor coming along shouldn't disrupt that.
> 
> The main bonus point on this interface is that Hyper-V, VMware and QEMU 
> implement it already. It would be a very natural for into the ecosystem. I 
> agree though that we shouldn't have our user space interface necessarily 
> dictated by it: Other hypervisors may implement different ways such as a 
> simple edge IRQ that gets triggered whenever the VM gets resumed.
> 
>> So, anyway, here are a few options with some pros and cons for the
>> kernel notifying userspace that its RNG should reseed.
> 
> I can only stress again that we should not be laser focused on the RNG case. 
> In a lot of cases, data has already been generated by the RNG before the 
> snapshot and needs to be reinitialized after 

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-19 Thread Mathieu Desnoyers
- On Oct 17, 2020, at 2:10 PM, Andy Lutomirski l...@kernel.org wrote:

> On Fri, Oct 16, 2020 at 6:40 PM Jann Horn  wrote:
>>
>> [adding some more people who are interested in RNG stuff: Andy, Jason,
>> Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this
>> concerns some pretty fundamental API stuff related to RNG usage]
>>
>> On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin
>>  wrote:
>> > - Background
>> >
>> > The VM Generation ID is a feature defined by Microsoft (paper:
>> > http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
>> > multiple hypervisor vendors.
>> >
>> > The feature is required in virtualized environments by apps that work
>> > with local copies/caches of world-unique data such as random values,
>> > uuids, monotonically increasing counters, etc.
>> > Such apps can be negatively affected by VM snapshotting when the VM
>> > is either cloned or returned to an earlier point in time.
>> >
>> > The VM Generation ID is a simple concept meant to alleviate the issue
>> > by providing a unique ID that changes each time the VM is restored
>> > from a snapshot. The hw provided UUID value can be used to
>> > differentiate between VMs or different generations of the same VM.
>> >
>> > - Problem
>> >
>> > The VM Generation ID is exposed through an ACPI device by multiple
>> > hypervisor vendors but neither the vendors or upstream Linux have no
>> > default driver for it leaving users to fend for themselves.
>> >
>> > Furthermore, simply finding out about a VM generation change is only
>> > the starting point of a process to renew internal states of possibly
>> > multiple applications across the system. This process could benefit
>> > from a driver that provides an interface through which orchestration
>> > can be easily done.
>> >
>> > - Solution
>> >
>> > This patch is a driver which exposes the Virtual Machine Generation ID
>> > via a char-dev FS interface that provides ID update sync and async
>> > notification, retrieval and confirmation mechanisms:
>> >
>> > When the device is 'open()'ed a copy of the current vm UUID is
>> > associated with the file handle. 'read()' operations block until the
>> > associated UUID is no longer up to date - until HW vm gen id changes -
>> > at which point the new UUID is provided/returned. Nonblocking 'read()'
>> > uses EWOULDBLOCK to signal that there is no _new_ UUID available.
>> >
>> > 'poll()' is implemented to allow polling for UUID updates. Such
>> > updates result in 'EPOLLIN' events.
>> >
>> > Subsequent read()s following a UUID update no longer block, but return
>> > the updated UUID. The application needs to acknowledge the UUID update
>> > by confirming it through a 'write()'.
>> > Only on writing back to the driver the right/latest UUID, will the
>> > driver mark this "watcher" as up to date and remove EPOLLIN status.
>> >
>> > 'mmap()' support allows mapping a single read-only shared page which
>> > will always contain the latest UUID value at offset 0.
>>
>> It would be nicer if that page just contained an incrementing counter,
>> instead of a UUID. It's not like the application cares *what* the UUID
>> changed to, just that it *did* change and all RNGs state now needs to
>> be reseeded from the kernel, right? And an application can't reliably
>> read the entire UUID from the memory mapping anyway, because the VM
>> might be forked in the middle.
>>
>> So I think your kernel driver should detect UUID changes and then turn
>> those into a monotonically incrementing counter. (Probably 64 bits
>> wide?) (That's probably also a little bit faster than comparing an
>> entire UUID.)
>>
>> An option might be to put that counter into the vDSO, instead of a
>> separate VMA; but I don't know how the other folks feel about that.
>> Andy, do you have opinions on this? That way, normal userspace code
>> that uses this infrastructure wouldn't have to mess around with a
>> special device at all. And it'd be usable in seccomp sandboxes and so
>> on without needing special plumbing. And libraries wouldn't have to
>> call open() and mess with file descriptor numbers.
> 
> The vDSO might be annoyingly slow for this.  Something like the rseq
> page might make sense.  It could be a generic indication of "system
> went through some form of suspend".

This might indeed fit nicely as an extension of my KTLS prototype (extensible 
rseq):

https://lore.kernel.org/lkml/20200925181518.4141-1-mathieu.desnoy...@efficios.com/

There are a few ways we could wire things up. One might be to add the
UUID field into the extended KTLS structure (so it's always updated after it
changes on next return to user-space). For this I assume that the Linux 
scheduler
within the guest VM always preempts all threads before a VM is suspended (is 
that
indeed true ?).

This leads to one important question though: how is the UUID check vs commit 
operation
made atomic with respect to suspend ? Unless we use rseq critical sections in 
assembly,
where the kernel 

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-19 Thread Michael S. Tsirkin
On Sun, Oct 18, 2020 at 09:14:00AM -0700, Andy Lutomirski wrote:
> On Sun, Oct 18, 2020 at 8:59 AM Michael S. Tsirkin  wrote:
> >
> > On Sun, Oct 18, 2020 at 08:54:36AM -0700, Andy Lutomirski wrote:
> > > On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > > > > 4c. The guest kernel maintains an array of physical addresses that are
> > > > > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > > > > location through whatever protocol, and before resuming a
> > > > > moved/snapshotted/duplicated VM, it takes the responsibility for
> > > > > memzeroing this memory. The huge pro here would be that this
> > > > > eliminates all races, and reduces complexity quite a bit, because the
> > > > > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > > > > with this, and it can even optimize things like on-disk memory
> > > > > snapshots to simply not write out those pages to disk.
> > > > >
> > > > > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > > > > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > > > > userspace API to deal with, and it'd be race free, and eliminate a lot
> > > > > of kernel complexity.
> > > >
> > > > Clearly this has a chance to break applications, right?
> > > > If there's an app that uses this as a non-system-calls way
> > > > to find out whether there was a fork, it will break
> > > > when wipe triggers without a fork ...
> > > > For example, imagine:
> > > >
> > > > MADV_WIPEONFORK
> > > > copy secret data to MADV_DONTFORK
> > > > fork
> > > >
> > > >
> > > > used to work, with this change it gets 0s instead of the secret data.
> > > >
> > > >
> > > > I am also not sure it's wise to expose each guest process
> > > > to the hypervisor like this. E.g. each process needs a
> > > > guest physical address of its own then. This is a finite resource.
> > > >
> > > >
> > > > The mmap interface proposed here is somewhat baroque, but it is
> > > > certainly simple to implement ...
> > >
> > > Wipe of fork/vmgenid/whatever could end up being much more problematic
> > > than it naively appears -- it could be wiped in the middle of a read.
> > > Either the API needs to handle this cleanly, or we need something more
> > > aggressive like signal-on-fork.
> > >
> > > --Andy
> >
> >
> > Right, it's not on fork, it's actually when process is snapshotted.
> >
> > If we assume it's CRIU we care about, then I
> > wonder what's wrong with something like
> > MADV_CHANGEONPTRACE_SEIZE
> > and basically say it's X bytes which change the value...
> 
> I feel like we may be approaching this from the wrong end.  Rather
> than saying "what data structure can the kernel expose that might
> plausibly be useful", how about we try identifying some specific
> userspace needs and see what a good solution could look like.  I can
> identify two major cryptographic use cases:

Well, I'm aware of a non-cryptographic use-case:
https://bugzilla.redhat.com/show_bug.cgi?id=1118834

this seems to just ask for the guest to have a way to detect that
a VM cloning triggered.


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-18 Thread Andy Lutomirski
On Sun, Oct 18, 2020 at 8:59 AM Michael S. Tsirkin  wrote:
>
> On Sun, Oct 18, 2020 at 08:54:36AM -0700, Andy Lutomirski wrote:
> > On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin  wrote:
> > >
> > > On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > > > 4c. The guest kernel maintains an array of physical addresses that are
> > > > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > > > location through whatever protocol, and before resuming a
> > > > moved/snapshotted/duplicated VM, it takes the responsibility for
> > > > memzeroing this memory. The huge pro here would be that this
> > > > eliminates all races, and reduces complexity quite a bit, because the
> > > > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > > > with this, and it can even optimize things like on-disk memory
> > > > snapshots to simply not write out those pages to disk.
> > > >
> > > > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > > > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > > > userspace API to deal with, and it'd be race free, and eliminate a lot
> > > > of kernel complexity.
> > >
> > > Clearly this has a chance to break applications, right?
> > > If there's an app that uses this as a non-system-calls way
> > > to find out whether there was a fork, it will break
> > > when wipe triggers without a fork ...
> > > For example, imagine:
> > >
> > > MADV_WIPEONFORK
> > > copy secret data to MADV_DONTFORK
> > > fork
> > >
> > >
> > > used to work, with this change it gets 0s instead of the secret data.
> > >
> > >
> > > I am also not sure it's wise to expose each guest process
> > > to the hypervisor like this. E.g. each process needs a
> > > guest physical address of its own then. This is a finite resource.
> > >
> > >
> > > The mmap interface proposed here is somewhat baroque, but it is
> > > certainly simple to implement ...
> >
> > Wipe of fork/vmgenid/whatever could end up being much more problematic
> > than it naively appears -- it could be wiped in the middle of a read.
> > Either the API needs to handle this cleanly, or we need something more
> > aggressive like signal-on-fork.
> >
> > --Andy
>
>
> Right, it's not on fork, it's actually when process is snapshotted.
>
> If we assume it's CRIU we care about, then I
> wonder what's wrong with something like
> MADV_CHANGEONPTRACE_SEIZE
> and basically say it's X bytes which change the value...

I feel like we may be approaching this from the wrong end.  Rather
than saying "what data structure can the kernel expose that might
plausibly be useful", how about we try identifying some specific
userspace needs and see what a good solution could look like.  I can
identify two major cryptographic use cases:

1. A userspace RNG.  The API exposed by the userspace end is a
function that generates random numbers.  The userspace code in turn
wants to know some things from the kernel: it wants some
best-quality-available random seed data from the kernel (and possibly
an indication of how good it is) as well as an indication of whether
the userspace memory may have been cloned or rolled back, or, failing
that, an indication of whether a reseed is needed.  Userspace could
implement a wide variety of algorithms on top depending on its goals
and compliance requirements, but the end goal is for the userspace
part to be very, very fast.

2. A userspace crypto stack that wants to avoid shooting itself in the
foot due to inadvertently doing the same thing twice.  For example, an
AES-GCM stack does not want to reuse an IV, *expecially* if there is
even the slightest chance that it might reuse the IV for different
data.  This use case doesn't necessarily involve random numbers, but,
if anything, it needs to be even faster than #1.

The threats here are not really the same.  For #1, a userspace RNG
should be able to recover from a scenario in which an adversary clones
the entire process *and gets to own the clone*.  For example, in
Android, an adversary can often gain complete control of a fork of the
zygote -- this shouldn't adversely affect the security properties of
other forks.  Similarly, a server farm could operate by having one
booted server that is cloned to create more workers.  Those clones
could be provisioned with secrets and permissions post-clone, and at
attacker gaining control of a fresh clone could be considered
acceptable.  For #2, in contrast, if an adversary gains control of a
clone of an AES-GCM session, they learn the key outright -- the
relevant attack scenario is that the adversary gets to interact with
two clones without compromising either clone per se.

It's worth noting that, in both cases, there could possibly be more
than one instance of an RNG or an AES-GCM session in the same process.
This means that using signals is awkward but not necessarily
impossibly.  (This is an area in which Linux, and POSIX in general, is
much weaker than Windows.)

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-18 Thread Michael S. Tsirkin
On Sun, Oct 18, 2020 at 08:54:36AM -0700, Andy Lutomirski wrote:
> On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin  wrote:
> >
> > On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > > 4c. The guest kernel maintains an array of physical addresses that are
> > > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > > location through whatever protocol, and before resuming a
> > > moved/snapshotted/duplicated VM, it takes the responsibility for
> > > memzeroing this memory. The huge pro here would be that this
> > > eliminates all races, and reduces complexity quite a bit, because the
> > > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > > with this, and it can even optimize things like on-disk memory
> > > snapshots to simply not write out those pages to disk.
> > >
> > > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > > userspace API to deal with, and it'd be race free, and eliminate a lot
> > > of kernel complexity.
> >
> > Clearly this has a chance to break applications, right?
> > If there's an app that uses this as a non-system-calls way
> > to find out whether there was a fork, it will break
> > when wipe triggers without a fork ...
> > For example, imagine:
> >
> > MADV_WIPEONFORK
> > copy secret data to MADV_DONTFORK
> > fork
> >
> >
> > used to work, with this change it gets 0s instead of the secret data.
> >
> >
> > I am also not sure it's wise to expose each guest process
> > to the hypervisor like this. E.g. each process needs a
> > guest physical address of its own then. This is a finite resource.
> >
> >
> > The mmap interface proposed here is somewhat baroque, but it is
> > certainly simple to implement ...
> 
> Wipe of fork/vmgenid/whatever could end up being much more problematic
> than it naively appears -- it could be wiped in the middle of a read.
> Either the API needs to handle this cleanly, or we need something more
> aggressive like signal-on-fork.
> 
> --Andy


Right, it's not on fork, it's actually when process is snapshotted.

If we assume it's CRIU we care about, then I
wonder what's wrong with something like
MADV_CHANGEONPTRACE_SEIZE
and basically say it's X bytes which change the value...


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-18 Thread Andy Lutomirski
On Sun, Oct 18, 2020 at 8:52 AM Michael S. Tsirkin  wrote:
>
> On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> > 4c. The guest kernel maintains an array of physical addresses that are
> > MADV_WIPEONFORK. The hypervisor knows about this array and its
> > location through whatever protocol, and before resuming a
> > moved/snapshotted/duplicated VM, it takes the responsibility for
> > memzeroing this memory. The huge pro here would be that this
> > eliminates all races, and reduces complexity quite a bit, because the
> > hypervisor can perfectly synchronize its bringup (and SMP bringup)
> > with this, and it can even optimize things like on-disk memory
> > snapshots to simply not write out those pages to disk.
> >
> > A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> > reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> > userspace API to deal with, and it'd be race free, and eliminate a lot
> > of kernel complexity.
>
> Clearly this has a chance to break applications, right?
> If there's an app that uses this as a non-system-calls way
> to find out whether there was a fork, it will break
> when wipe triggers without a fork ...
> For example, imagine:
>
> MADV_WIPEONFORK
> copy secret data to MADV_DONTFORK
> fork
>
>
> used to work, with this change it gets 0s instead of the secret data.
>
>
> I am also not sure it's wise to expose each guest process
> to the hypervisor like this. E.g. each process needs a
> guest physical address of its own then. This is a finite resource.
>
>
> The mmap interface proposed here is somewhat baroque, but it is
> certainly simple to implement ...

Wipe of fork/vmgenid/whatever could end up being much more problematic
than it naively appears -- it could be wiped in the middle of a read.
Either the API needs to handle this cleanly, or we need something more
aggressive like signal-on-fork.

--Andy
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-18 Thread Michael S. Tsirkin
On Sat, Oct 17, 2020 at 03:24:08PM +0200, Jason A. Donenfeld wrote:
> 4c. The guest kernel maintains an array of physical addresses that are
> MADV_WIPEONFORK. The hypervisor knows about this array and its
> location through whatever protocol, and before resuming a
> moved/snapshotted/duplicated VM, it takes the responsibility for
> memzeroing this memory. The huge pro here would be that this
> eliminates all races, and reduces complexity quite a bit, because the
> hypervisor can perfectly synchronize its bringup (and SMP bringup)
> with this, and it can even optimize things like on-disk memory
> snapshots to simply not write out those pages to disk.
> 
> A 4c-like approach seems like it'd be a lot of bang for the buck -- we
> reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
> userspace API to deal with, and it'd be race free, and eliminate a lot
> of kernel complexity.

Clearly this has a chance to break applications, right?
If there's an app that uses this as a non-system-calls way
to find out whether there was a fork, it will break
when wipe triggers without a fork ...
For example, imagine:

MADV_WIPEONFORK
copy secret data to MADV_DONTFORK
fork


used to work, with this change it gets 0s instead of the secret data.


I am also not sure it's wise to expose each guest process
to the hypervisor like this. E.g. each process needs a
guest physical address of its own then. This is a finite resource.


The mmap interface proposed here is somewhat baroque, but it is
certainly simple to implement ...

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-17 Thread Jann Horn via Virtualization
On Sat, Oct 17, 2020 at 8:09 PM Alexander Graf  wrote:
> There are applications way beyond that though. What do you do with
> applications that already consumed randomness? For example a cached pool
> of SSL keys. Or a higher level language primitive that consumes
> randomness and caches its seed somewhere in an internal data structure.

For deterministic protection, those would also have to poll some
memory location that tells them whether the VmGenID changed:

1. between reading entropy from their RNG pool and using it
2. between collecting data from external sources (user input, clock,
...) and encrypting it

and synchronously shoot down the connection if a change happened. If
e.g. an application inside the VM has an AES-GCM-encrypted TLS
connection and, directly after the VM is restored, triggers an
application-level timeout that sends some fixed message across the
connection, then the TLS library must guarantee that either the VM was
already committed to sending exactly that message before the VM was
forked or the message will be blocked. If we don't do that, an
attacker who captures both a single packet from the forked VM and
traffic from the old VM can decrypt the next message from the old VM
after the fork (because AES-GCM is like AES-CTR plus an authenticator,
and CTR means that when keystream reuse occurs and one of the
plaintexts is known, the attacker can simply recover the other
plaintext using XOR).

(Or maybe, in disaster failover environments, TLS 1.3 servers could
get away with rekeying the connection instead of shooting it down? Ask
your resident friendly cryptographer whether that would be secure, I
am not one.)

I don't think a mechanism based around asynchronously telling the
application and waiting for it to confirm the rotation at a later
point is going to cut it; we should have some hard semantics on when
an application needs to poll this value.

> Or even worse: your system's host ssh key.

Mmmh... I think I normally would not want a VM to reset its host ssh
key after merely restoring a snapshot though? And more importantly,
Microsoft's docs say that they also change the VmGenID on disaster
failover. I think you very much wouldn't want your server to lose its
host key every time disaster failover happens. On the other hand,
after importing a public VM image, it might be a good idea.

I guess you could push that responsibility on the user, by adding an
option to the sshd_config that tells OpenSSH whether the host key
should be rotated on an ID change or not... but that still would not
be particularly pretty.

Ideally we would have the host tell us what type of events happened to
the VM, or something like that... or maybe even get the host VM
management software to ask the user whether they're importing a public
image... I really feel like with Microsoft's current protocol, we
don't get enough information to figure out what we should do about
private long-term authentication keys.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-17 Thread Andy Lutomirski
On Fri, Oct 16, 2020 at 6:40 PM Jann Horn  wrote:
>
> [adding some more people who are interested in RNG stuff: Andy, Jason,
> Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this
> concerns some pretty fundamental API stuff related to RNG usage]
>
> On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin
>  wrote:
> > - Background
> >
> > The VM Generation ID is a feature defined by Microsoft (paper:
> > http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
> > multiple hypervisor vendors.
> >
> > The feature is required in virtualized environments by apps that work
> > with local copies/caches of world-unique data such as random values,
> > uuids, monotonically increasing counters, etc.
> > Such apps can be negatively affected by VM snapshotting when the VM
> > is either cloned or returned to an earlier point in time.
> >
> > The VM Generation ID is a simple concept meant to alleviate the issue
> > by providing a unique ID that changes each time the VM is restored
> > from a snapshot. The hw provided UUID value can be used to
> > differentiate between VMs or different generations of the same VM.
> >
> > - Problem
> >
> > The VM Generation ID is exposed through an ACPI device by multiple
> > hypervisor vendors but neither the vendors or upstream Linux have no
> > default driver for it leaving users to fend for themselves.
> >
> > Furthermore, simply finding out about a VM generation change is only
> > the starting point of a process to renew internal states of possibly
> > multiple applications across the system. This process could benefit
> > from a driver that provides an interface through which orchestration
> > can be easily done.
> >
> > - Solution
> >
> > This patch is a driver which exposes the Virtual Machine Generation ID
> > via a char-dev FS interface that provides ID update sync and async
> > notification, retrieval and confirmation mechanisms:
> >
> > When the device is 'open()'ed a copy of the current vm UUID is
> > associated with the file handle. 'read()' operations block until the
> > associated UUID is no longer up to date - until HW vm gen id changes -
> > at which point the new UUID is provided/returned. Nonblocking 'read()'
> > uses EWOULDBLOCK to signal that there is no _new_ UUID available.
> >
> > 'poll()' is implemented to allow polling for UUID updates. Such
> > updates result in 'EPOLLIN' events.
> >
> > Subsequent read()s following a UUID update no longer block, but return
> > the updated UUID. The application needs to acknowledge the UUID update
> > by confirming it through a 'write()'.
> > Only on writing back to the driver the right/latest UUID, will the
> > driver mark this "watcher" as up to date and remove EPOLLIN status.
> >
> > 'mmap()' support allows mapping a single read-only shared page which
> > will always contain the latest UUID value at offset 0.
>
> It would be nicer if that page just contained an incrementing counter,
> instead of a UUID. It's not like the application cares *what* the UUID
> changed to, just that it *did* change and all RNGs state now needs to
> be reseeded from the kernel, right? And an application can't reliably
> read the entire UUID from the memory mapping anyway, because the VM
> might be forked in the middle.
>
> So I think your kernel driver should detect UUID changes and then turn
> those into a monotonically incrementing counter. (Probably 64 bits
> wide?) (That's probably also a little bit faster than comparing an
> entire UUID.)
>
> An option might be to put that counter into the vDSO, instead of a
> separate VMA; but I don't know how the other folks feel about that.
> Andy, do you have opinions on this? That way, normal userspace code
> that uses this infrastructure wouldn't have to mess around with a
> special device at all. And it'd be usable in seccomp sandboxes and so
> on without needing special plumbing. And libraries wouldn't have to
> call open() and mess with file descriptor numbers.

The vDSO might be annoyingly slow for this.  Something like the rseq
page might make sense.  It could be a generic indication of "system
went through some form of suspend".
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-17 Thread Jann Horn via Virtualization
On Sat, Oct 17, 2020 at 8:44 AM Willy Tarreau  wrote:
> On Sat, Oct 17, 2020 at 07:52:48AM +0200, Jann Horn wrote:
> > On Sat, Oct 17, 2020 at 7:37 AM Willy Tarreau  wrote:
> > > On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote:
> > > > Microsoft's documentation
> > > > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
> > > > Generation ID that we get after a fork "is a 128-bit,
> > > > cryptographically random integer value". If multiple people use the
> > > > same image, it guarantees that each use of the image gets its own,
> > > > fresh ID:
> > >
> > > No. It cannot be more unique than the source that feeds that cryptographic
> > > transformation. All it guarantees is that the entropy source is protected
> > > from being guessed based on the output. Applying cryptography on a simple
> > > counter provides apparently random numbers that will be unique for a long
> > > period for the same source, but as soon as you duplicate that code between
> > > users and they start from the same counter they'll get the same IDs.
> > >
> > > This is why I think that using a counter is better if you really need 
> > > something
> > > unique. Randoms only reduce predictability which helps avoiding 
> > > collisions.
> >
> > Microsoft's spec tells us that they're giving us cryptographically
> > random numbers. Where they're getting those from is not our problem.
> > (And if even the hypervisor is not able to collect enough entropy to
> > securely generate random numbers, worrying about RNG reseeding in the
> > guest would be kinda pointless, we'd be fairly screwed anyway.)
>
> Sorry if I sound annoying, but it's a matter of terminology and needs.
>
> Cryptograhically random means safe for use with cryptography in that it
> is unguessable enough so that you can use it for encryption keys that
> nobody will be able to guess. It in no ways guarantees uniqueness, just
> like you don't really care if the symmetric crypto key of you VPN has
> already been used once somewhere else as long as there's no way to know.
> However with the good enough distribution that a CSPRNG provides,
> collisions within a *same* generator are bound to a very low, predictable
> rate which is by generally considered as acceptable for all use cases.

Yes.

> Something random (cryptographically or not) *cannot* be unique by
> definition, otherwise it's not random anymore, since each draw has an
> influence on the remaining list of possible draws, which is contrary to
> randomness. And conversely something unique cannot be completely random
> because if you know it's unique, you can already rule out all other known
> values from the candidates, thus it's more predictable than random.

Yes.

> With this in mind, picking randoms from a same RNG is often highly
> sufficient to consider they're highly likely unique within a long
> period. But it's not a guarantee. And it's even less one between two
> RNGs (e.g. if uniqueness is required between multiple hypervisors in
> case VMs are migrated or centrally managed, which I don't know).
>
> If what is sought here is a strong guarantee of uniqueness, using a
> counter as you first suggested is better.

My suggestion is to use a counter *in the UAPI*, not in the hypervisor
protocol. (And as long as that counter can only miss increments in a
cryptographically negligible fraction of cases, everything's fine.)

> If what is sought is pure
> randomness (in the sense that it's unpredictable, which I don't think
> is needed here), then randoms are better.

And this is what *the hypervisor protocol* gives us (which could be
very useful for reseeding the kernel RNG).

> If both are required, just
> concatenate a counter and a random. And if you need them to be spatially
> unique, just include a node identifier.
>
> Now the initial needs in the forwarded message are not entirely clear
> to me but I wanted to rule out the apparent mismatch between the expressed
> needs for uniqueness and the proposed solutions solely based on randomness.

Sure, from a theoretical standpoint, it would be a little bit nicer if
the hypervisor protocol included a generation number along with the
128-bit random value. But AFAIU it doesn't, so if we want this to just
work under Microsoft's existing hypervisor, we'll have to make do with
checking whether the random value changed. :P
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn via Virtualization
On Sat, Oct 17, 2020 at 7:37 AM Willy Tarreau  wrote:
> On Sat, Oct 17, 2020 at 07:01:31AM +0200, Jann Horn wrote:
> > Microsoft's documentation
> > (http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
> > Generation ID that we get after a fork "is a 128-bit,
> > cryptographically random integer value". If multiple people use the
> > same image, it guarantees that each use of the image gets its own,
> > fresh ID:
>
> No. It cannot be more unique than the source that feeds that cryptographic
> transformation. All it guarantees is that the entropy source is protected
> from being guessed based on the output. Applying cryptography on a simple
> counter provides apparently random numbers that will be unique for a long
> period for the same source, but as soon as you duplicate that code between
> users and they start from the same counter they'll get the same IDs.
>
> This is why I think that using a counter is better if you really need 
> something
> unique. Randoms only reduce predictability which helps avoiding collisions.

Microsoft's spec tells us that they're giving us cryptographically
random numbers. Where they're getting those from is not our problem.
(And if even the hypervisor is not able to collect enough entropy to
securely generate random numbers, worrying about RNG reseeding in the
guest would be kinda pointless, we'd be fairly screwed anyway.)

Also note that we don't actually need to *always* reinitialize RNG
state on forks for functional correctness; it is fine if that fails
with a probability of 2^-128, because functionally everything will be
fine, and an attacker who is that lucky could also just guess an AES
key (which has the same probability of being successful). (And also
2^-128 is such a tiny number that it doesn't matter anyway.)

> And I'm saying this as someone who had on his external gateway the same SSH
> host key as 89 other hosts on the net, each of them using randoms to provide
> a universally unique one...

If your SSH host key was shared with 89 other hosts, it evidently
wasn't generated from cryptographically random numbers. :P Either
because the key generator was not properly hooked up to the system's
entropy pool (if you're talking about the Debian fiasco), or because
the system simply did not have enough entropy available. (Or because
the key generator is broken, but I don't think that ever happened with
OpenSSH?)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn via Virtualization
On Sat, Oct 17, 2020 at 6:34 AM Colm MacCarthaigh  wrote:
> On 16 Oct 2020, at 21:02, Jann Horn wrote:
> > On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau  wrote:
> > But in userspace, we just need a simple counter. There's no need for
> > us to worry about anything else, like timestamps or whatever. If we
> > repeatedly fork a paused VM, the forked VMs will see the same counter
> > value, but that's totally fine, because the only thing that matters to
> > userspace is that the counter changes when the VM is forked.
>
> For user-space, even a single bit would do. We added MADVISE_WIPEONFORK
> so that userspace libraries can detect fork()/clone() robustly, for the
> same reasons. It just wipes a page as the indicator, which is
> effectively a single-bit signal, and it works well. On the user-space
> side of this, I’m keen to find a solution like that that we can use
> fairly easily inside of portable libraries and applications. The “have
> I forked” checks do end up in hot paths, so it’s nice if they can be
> CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my
> favorite.

I'm pretty sure a single bit is not enough if you want to have a
single page, shared across the entire system, that stores the VM
forking state; you need a counter for that.

> > And actually, since the value is a cryptographically random 128-bit
> > value, I think that we should definitely use it to help reseed the
> > kernel's RNG, and keep it secret from userspace. That way, even if the
> > VM image is public, we can ensure that going forward, the kernel RNG
> > will return securely random data.
>
> If the image is public, you need some extra new raw entropy from
> somewhere. The gen-id could be mixed in, that can’t do any harm as
> long as rigorous cryptographic mixing with the prior state is used, but
> if that’s all you do then the final state is still deterministic and
> non-secret.

Microsoft's documentation
(http://go.microsoft.com/fwlink/?LinkId=260709) says that the VM
Generation ID that we get after a fork "is a 128-bit,
cryptographically random integer value". If multiple people use the
same image, it guarantees that each use of the image gets its own,
fresh ID: The table in section "How to implement virtual machine
generation ID support in a virtualization platform" says that (among
other things) "Virtual machine is imported, copied, or cloned"
generates a new generation ID.

So the RNG state after mixing in the new VM Generation ID would
contain 128 bits of secret entropy not known to anyone else, including
people with access to the VM image.

Now, 128 bits of cryptographically random data aren't _optimal_; I
think something on the order of 256 bits would be nicer from a
theoretical standpoint. But in practice I think we'll be good with the
128 bits we're getting (since the number of users who fork a VM image
is probably not going to be so large that worst-case collision
probabilities matter).

> The kernel would need to use the change as a trigger to
> measure some entropy (e.g. interrupts and RDRAND, or whatever). Our just
> define the machine contract as “this has to be unique random data and
> if it’s not unique, or if it’s pubic, you’re toast”.

As far as I can tell from Microsoft's spec, that is a guarantee we're
already getting.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn via Virtualization
On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau  wrote:
> On Sat, Oct 17, 2020 at 03:40:08AM +0200, Jann Horn wrote:
> > [adding some more people who are interested in RNG stuff: Andy, Jason,
> > Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this
> > concerns some pretty fundamental API stuff related to RNG usage]
> >
> > On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin
> >  wrote:
> > > This patch is a driver which exposes the Virtual Machine Generation ID
> > > via a char-dev FS interface that provides ID update sync and async
> > > notification, retrieval and confirmation mechanisms:
> > >
> > > When the device is 'open()'ed a copy of the current vm UUID is
> > > associated with the file handle. 'read()' operations block until the
> > > associated UUID is no longer up to date - until HW vm gen id changes -
> > > at which point the new UUID is provided/returned. Nonblocking 'read()'
> > > uses EWOULDBLOCK to signal that there is no _new_ UUID available.
> > >
> > > 'poll()' is implemented to allow polling for UUID updates. Such
> > > updates result in 'EPOLLIN' events.
> > >
> > > Subsequent read()s following a UUID update no longer block, but return
> > > the updated UUID. The application needs to acknowledge the UUID update
> > > by confirming it through a 'write()'.
> > > Only on writing back to the driver the right/latest UUID, will the
> > > driver mark this "watcher" as up to date and remove EPOLLIN status.
> > >
> > > 'mmap()' support allows mapping a single read-only shared page which
> > > will always contain the latest UUID value at offset 0.
> >
> > It would be nicer if that page just contained an incrementing counter,
> > instead of a UUID. It's not like the application cares *what* the UUID
> > changed to, just that it *did* change and all RNGs state now needs to
> > be reseeded from the kernel, right? And an application can't reliably
> > read the entire UUID from the memory mapping anyway, because the VM
> > might be forked in the middle.
> >
> > So I think your kernel driver should detect UUID changes and then turn
> > those into a monotonically incrementing counter. (Probably 64 bits
> > wide?) (That's probably also a little bit faster than comparing an
> > entire UUID.)
>
> I agree with this. Further, I'm observing there is a very common
> confusion between "universally unique" and "random". Randoms are
> needed when seeking unpredictability. A random number generator
> *must* be able to return the same value multiple times in a row
> (though this is rare), otherwise it's not random.
[...]
> If the UUIDs used there are real UUIDs, it could be as simple as
> updating them according to their format, i.e. updating the timestamp,
> and if the timestamp is already the same, just increase the seq counter.
> Doing this doesn't require entropy, doesn't need to block and doesn't
> needlessly leak randoms that sometimes make people feel nervous.

Those UUIDs are supplied by existing hypervisor code; in that regard,
this is almost like a driver for a hardware device. It is written
against a fixed API provided by the underlying machine. Making sure
that the sequence of UUIDs, as seen from inside the machine, never
changes back to a previous one is the responsibility of the hypervisor
and out of scope for this driver.

Microsoft's spec document (which is a .docx file for reasons I don't
understand) actually promises us that it is a cryptographically random
128-bit integer value, which means that if you fork a VM 2^64 times,
the probability that any two of those VMs have the same counter is
2^-64. That should be good enough.

But in userspace, we just need a simple counter. There's no need for
us to worry about anything else, like timestamps or whatever. If we
repeatedly fork a paused VM, the forked VMs will see the same counter
value, but that's totally fine, because the only thing that matters to
userspace is that the counter changes when the VM is forked.

And actually, since the value is a cryptographically random 128-bit
value, I think that we should definitely use it to help reseed the
kernel's RNG, and keep it secret from userspace. That way, even if the
VM image is public, we can ensure that going forward, the kernel RNG
will return securely random data.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread Jann Horn via Virtualization
[adding some more people who are interested in RNG stuff: Andy, Jason,
Theodore, Willy Tarreau, Eric Biggers. also linux-api@, because this
concerns some pretty fundamental API stuff related to RNG usage]

On Fri, Oct 16, 2020 at 4:33 PM Catangiu, Adrian Costin
 wrote:
> - Background
>
> The VM Generation ID is a feature defined by Microsoft (paper:
> http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
> multiple hypervisor vendors.
>
> The feature is required in virtualized environments by apps that work
> with local copies/caches of world-unique data such as random values,
> uuids, monotonically increasing counters, etc.
> Such apps can be negatively affected by VM snapshotting when the VM
> is either cloned or returned to an earlier point in time.
>
> The VM Generation ID is a simple concept meant to alleviate the issue
> by providing a unique ID that changes each time the VM is restored
> from a snapshot. The hw provided UUID value can be used to
> differentiate between VMs or different generations of the same VM.
>
> - Problem
>
> The VM Generation ID is exposed through an ACPI device by multiple
> hypervisor vendors but neither the vendors or upstream Linux have no
> default driver for it leaving users to fend for themselves.
>
> Furthermore, simply finding out about a VM generation change is only
> the starting point of a process to renew internal states of possibly
> multiple applications across the system. This process could benefit
> from a driver that provides an interface through which orchestration
> can be easily done.
>
> - Solution
>
> This patch is a driver which exposes the Virtual Machine Generation ID
> via a char-dev FS interface that provides ID update sync and async
> notification, retrieval and confirmation mechanisms:
>
> When the device is 'open()'ed a copy of the current vm UUID is
> associated with the file handle. 'read()' operations block until the
> associated UUID is no longer up to date - until HW vm gen id changes -
> at which point the new UUID is provided/returned. Nonblocking 'read()'
> uses EWOULDBLOCK to signal that there is no _new_ UUID available.
>
> 'poll()' is implemented to allow polling for UUID updates. Such
> updates result in 'EPOLLIN' events.
>
> Subsequent read()s following a UUID update no longer block, but return
> the updated UUID. The application needs to acknowledge the UUID update
> by confirming it through a 'write()'.
> Only on writing back to the driver the right/latest UUID, will the
> driver mark this "watcher" as up to date and remove EPOLLIN status.
>
> 'mmap()' support allows mapping a single read-only shared page which
> will always contain the latest UUID value at offset 0.

It would be nicer if that page just contained an incrementing counter,
instead of a UUID. It's not like the application cares *what* the UUID
changed to, just that it *did* change and all RNGs state now needs to
be reseeded from the kernel, right? And an application can't reliably
read the entire UUID from the memory mapping anyway, because the VM
might be forked in the middle.

So I think your kernel driver should detect UUID changes and then turn
those into a monotonically incrementing counter. (Probably 64 bits
wide?) (That's probably also a little bit faster than comparing an
entire UUID.)

An option might be to put that counter into the vDSO, instead of a
separate VMA; but I don't know how the other folks feel about that.
Andy, do you have opinions on this? That way, normal userspace code
that uses this infrastructure wouldn't have to mess around with a
special device at all. And it'd be usable in seccomp sandboxes and so
on without needing special plumbing. And libraries wouldn't have to
call open() and mess with file descriptor numbers.

> The driver also adds support for tracking count of open file handles
> that haven't acknowledged an UUID update. This is exposed through
> two IOCTLs:
>  * VMGENID_GET_OUTDATED_WATCHERS: immediately returns the number of
>_outdated_ watchers - number of file handles that were open during
>a VM generation change, and which have not yet confirmed the new
>Vm-Gen-Id.
>  * VMGENID_WAIT_WATCHERS: blocks until there are no more _outdated_
>watchers, or if a 'timeout' argument is provided, until the timeout
>expires.

Does this mean that code that uses the memory mapping to detect
changes is still supposed to confirm generation IDs? What about
multithreaded processes, especially ones that use libraries - if a
library opens a single file descriptor that is used from multiple
threads, is the library required to synchronize all its threads before
confirming the change? That seems like it could get messy. And it
again means that this interface can't easily be used from inside
seccomp sandboxes.

[...]
> diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst
[...]
> +``close()``:
> +  Removes the file handle as a Vm-Gen-Id watcher.

(Linux doesn't have "file handles". Technically 

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-16 Thread gre...@linuxfoundation.org
On Fri, Oct 16, 2020 at 02:33:15PM +, Catangiu, Adrian Costin wrote:
> +config VMGENID
> + tristate "Virtual Machine Generation ID driver"
> + depends on ACPI
> + default M

Unless this is required to boot a machine, this should be removed.

> + help
> +   This is a Virtual Machine Generation ID driver which provides
> +   a virtual machine unique identifier. The provided UUID can be
> +   watched through the FS interface exposed by this driver, and
> +   thus can provide notifications for VM snapshot or cloning events.

Where is the uuid exposed in the filesystem?

> +   This enables applications and libraries that store or cache
> +   sensitive information, to know that they need to regenerate it
> +   after process memory has been exposed to potential copying.

Module name?

> +
>  config FSL_HV_MANAGER
>   tristate "Freescale hypervisor management driver"
>   depends on FSL_SOC
> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> index fd33124..a1f8dcc 100644
> --- a/drivers/virt/Makefile
> +++ b/drivers/virt/Makefile
> @@ -4,4 +4,5 @@
>  #
>  
>  obj-$(CONFIG_FSL_HV_MANAGER) += fsl_hypervisor.o
> +obj-$(CONFIG_VMGENID)+= vmgenid.o
>  obj-y+= vboxguest/
> diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
> new file mode 100644
> index 000..d314c72
> --- /dev/null
> +++ b/drivers/virt/vmgenid.c
> @@ -0,0 +1,419 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtual Machine Generation ID driver
> + *
> + * Copyright (C) 2018 Red Hat Inc, Copyright (C) 2020 Amazon.com Inc

That's not how you write copyright lines, please fix up.

> + * All rights reserved.
> + *   Authors:
> + * Adrian Catangiu 
> + * Or Idgar 
> + * Gal Hammer 
> + *
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DEV_NAME "vmgenid"
> +ACPI_MODULE_NAME(DEV_NAME);
> +
> +struct dev_data {
> + struct cdev   cdev;
> + dev_t dev_id;
> + unsigned long map_buf;
> +
> + void  *uuid_iomap;
> + uuid_tuuid;
> + wait_queue_head_t read_wait;
> +
> + atomic_t  watchers;
> + atomic_t  outdated_watchers;
> + wait_queue_head_t outdated_wait;
> +};
> +
> +struct file_data {
> + struct dev_data  *dev_data;
> + uuid_t   acked_uuid;
> +};
> +
> +static bool vmgenid_uuid_matches(struct dev_data *priv, uuid_t *uuid)
> +{
> + return !memcmp(uuid, >uuid, sizeof(uuid_t));
> +}
> +
> +static void vmgenid_put_outdated_watchers(struct dev_data *priv)
> +{
> + if (atomic_dec_and_test(>outdated_watchers))
> + wake_up_interruptible(>outdated_wait);
> +}
> +
> +static int vmgenid_open(struct inode *inode, struct file *file)
> +{
> + struct dev_data *priv =
> + container_of(inode->i_cdev, struct dev_data, cdev);
> + struct file_data *file_data =
> + kzalloc(sizeof(struct file_data), GFP_KERNEL);
> +
> + if (!file_data)
> + return -ENOMEM;
> +
> + file_data->acked_uuid = priv->uuid;
> + file_data->dev_data = priv;
> +
> + file->private_data = file_data;
> + atomic_inc(>watchers);
> +
> + return 0;
> +}
> +
> +static int vmgenid_close(struct inode *inode, struct file *file)
> +{
> + struct file_data *file_data = (struct file_data *) file->private_data;
> + struct dev_data *priv = file_data->dev_data;
> +
> + if (!vmgenid_uuid_matches(priv, _data->acked_uuid))
> + vmgenid_put_outdated_watchers(priv);
> + atomic_dec(>watchers);
> + kfree(file->private_data);
> +
> + return 0;
> +}
> +
> +static ssize_t
> +vmgenid_read(struct file *file, char __user *ubuf, size_t nbytes, loff_t 
> *ppos)
> +{
> + struct file_data *file_data =
> + (struct file_data *) file->private_data;
> + struct dev_data *priv = file_data->dev_data;
> + ssize_t ret;
> +
> + if (nbytes == 0)
> + return 0;
> + /* disallow partial UUID reads */
> + if (nbytes < sizeof(uuid_t))
> + return -EINVAL;
> + nbytes = sizeof(uuid_t);
> +
> + if (vmgenid_uuid_matches(priv, _data->acked_uuid)) {
> + if (file->f_flags & O_NONBLOCK)
> + return -EAGAIN;
> + ret = wait_event_interruptible(
> + priv->read_wait,
> + !vmgenid_uuid_matches(priv, _data->acked_uuid)
> + );
> + if (ret)
> + return ret;
> + }
> +
> + ret = copy_to_user(ubuf, >uuid, nbytes);
> + if (ret)
> + return -EFAULT;
> +
> + return nbytes;
> +}
> +
> +static ssize_t vmgenid_write(struct file *file, const char __user *ubuf,
> + size_t count, loff_t *ppos)
> +{
> + struct file_data *file_data = (struct file_data *) file->private_data;
> + struct