Re: [PATCH] x86: fix q35 kernel measurements broken due to rng seeding

2023-02-02 Thread H. Peter Anvin
On February 2, 2023 7:17:01 AM PST, James Bottomley  wrote:
>On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote:
>[...]
>> NAK. We need to fix the actual problem of the kernel stomping on
>> memory it shouldn't, not paper around it.
>
>This is a first boot situation, not kexec (I just updated kexec because
>it should use any new mechanism we propose).  Unlike kexec, for first
>boot we're very constrained by the amount of extra space QEMU has to do
>this.  The boot_params are the first page of the kernel load, but the
>kernel proper begins directly after it, so we can't expand it.  The two
>schemes tried: loading after the kernel and loading after the command
>line both tamper with integrity protected files, so we shouldn't use
>this mechanism.  This is the essence of the problem: If we add this
>area at boot, it has to go in an existing memory location; we can't
>steal random guest areas.  All current config parameters are passed
>through as fw_config files, so we can only use that mechanism *if* we
>know where the area ends up in the loaded kernel *and* the file isn't
>integrity protected (this latter is expanding over time).
>
>If we could wind back time, I'd have added the 32 byte random seed to
>boot_params properly not coded it as a setup_data addition, but now
>we're stuck with coping with existing behaviour, which is why I thought
>the retro fit to boot_params would be the better path forward, but if
>you have any alternatives, I'm sure we could look at them.
>
>James
>

One option that you do have that should be backwards compatible, even, is to 
mark that memory as reserved in the memory map, basically doing the job that 
the kernel decompressor should have done in the first place. The downside is 
that existing kennels will never reclaim that memory, but since it is such a 
small amount, it shouldn't really matter.

We could even reserve a memory type code for it; that way a newer kernel could 
know to reclaim that memory at the very end of the boot process, when it would 
deallocate other setup_data entries. Existing kernels will, for obvious 
reasons, treat unknown memory types as equivalent to type 2 – permanent keep 
out.




Re: [PATCH] x86: fix q35 kernel measurements broken due to rng seeding

2023-02-02 Thread H. Peter Anvin
On February 2, 2023 7:17:01 AM PST, James Bottomley  wrote:
>On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote:
>[...]
>> NAK. We need to fix the actual problem of the kernel stomping on
>> memory it shouldn't, not paper around it.
>
>This is a first boot situation, not kexec (I just updated kexec because
>it should use any new mechanism we propose).  Unlike kexec, for first
>boot we're very constrained by the amount of extra space QEMU has to do
>this.  The boot_params are the first page of the kernel load, but the
>kernel proper begins directly after it, so we can't expand it.  The two
>schemes tried: loading after the kernel and loading after the command
>line both tamper with integrity protected files, so we shouldn't use
>this mechanism.  This is the essence of the problem: If we add this
>area at boot, it has to go in an existing memory location; we can't
>steal random guest areas.  All current config parameters are passed
>through as fw_config files, so we can only use that mechanism *if* we
>know where the area ends up in the loaded kernel *and* the file isn't
>integrity protected (this latter is expanding over time).
>
>If we could wind back time, I'd have added the 32 byte random seed to
>boot_params properly not coded it as a setup_data addition, but now
>we're stuck with coping with existing behaviour, which is why I thought
>the retro fit to boot_params would be the better path forward, but if
>you have any alternatives, I'm sure we could look at them.
>
>James
>

Somewhat aside...

There are other issues with passing the entropy seed in memory, of course; in 
order to avoid accidental or malicious reuse it would be far better if the 
entropy was actively pulled at use time via virtual I/O, a hypercall, or 
RDRAND/RDSEED emulation, but I do realize that that is not always an option.





Re: [PATCH] x86: fix q35 kernel measurements broken due to rng seeding

2023-02-02 Thread H. Peter Anvin
On February 2, 2023 7:17:01 AM PST, James Bottomley  wrote:
>On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote:
>[...]
>> NAK. We need to fix the actual problem of the kernel stomping on
>> memory it shouldn't, not paper around it.
>
>This is a first boot situation, not kexec (I just updated kexec because
>it should use any new mechanism we propose).  Unlike kexec, for first
>boot we're very constrained by the amount of extra space QEMU has to do
>this.  The boot_params are the first page of the kernel load, but the
>kernel proper begins directly after it, so we can't expand it.  The two
>schemes tried: loading after the kernel and loading after the command
>line both tamper with integrity protected files, so we shouldn't use
>this mechanism.  This is the essence of the problem: If we add this
>area at boot, it has to go in an existing memory location; we can't
>steal random guest areas.  All current config parameters are passed
>through as fw_config files, so we can only use that mechanism *if* we
>know where the area ends up in the loaded kernel *and* the file isn't
>integrity protected (this latter is expanding over time).
>
>If we could wind back time, I'd have added the 32 byte random seed to
>boot_params properly not coded it as a setup_data addition, but now
>we're stuck with coping with existing behaviour, which is why I thought
>the retro fit to boot_params would be the better path forward, but if
>you have any alternatives, I'm sure we could look at them.
>
>James
>

The right thing to do is to fix the kernel so that it doesn't stomp on this 
memory, just as it cannot stomp on boot_params, initrd, or the command line. 
The kernel boot protocol defines a keep-out area, but physical kaslr violates 
it and so the kaslr code in the decompressor is responsible for keeping track 
of the keepout areas, and apparently noone every did.

Adding it to boot_params and bumping the version number is a hack that doesn't 
solve the backwards compatibility problem, so we should just fix the bug 
instead. Adding it to boot_params and adding a setup_data pointer MAY be 
backwards compatible, but it also enables an absolutely catastrophic failure 
mode: an unaware loader may end up relocating boot_params without knowing that 
that data has a secondary pointer into it, with the result that we now have a 
bogus pointer in a linked list. Not good.

Fixing the bug properly is also the only way forward for future setup_data 
users, and we are running low on space in the fixed-size structures.





Re: [PATCH] x86: fix q35 kernel measurements broken due to rng seeding

2023-02-02 Thread H. Peter Anvin
On February 2, 2023 6:38:12 AM PST, James Bottomley  wrote:
>On Wed, 2023-02-01 at 15:48 -0500, Jason A. Donenfeld wrote:
>[...]
>> But it sounds like you might now have a concrete suggestion on
>> something even better. I'm CCing hpa, as this is his wheelhouse, and
>> maybe you two can divise the next step while I'm away. Maybe the pad9
>> thing you mentioned is the super nice solution we've been searching
>> for this whole time. When I'm home in 10 days and have internet
>> again, I'll take a look at where thing's are out and try to figure
>> out how I can be productive again with it.
>
>OK, so just FYI HPA, this is the patch I'm thinking of sending to
>linux-kernel to reserve space in struct boot_params for this.  If you
>could take a look and advise on the location before I send the final
>patch, I'd be grateful.  I took space in _pad9 because that's the
>standard method (add on to end), but it does strike me we could also
>use all of _pad8 for the (the addition is only 48 bytes) or even _pad3
>+ hd0_info + hd1_info.
>
>James
>
>---
>
>diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
>index 9338c68e7413..0120ab77dac9 100644
>--- a/arch/x86/boot/header.S
>+++ b/arch/x86/boot/header.S
>@@ -308,7 +308,7 @@ _start:
>   # Part 2 of the header, from the old setup.S
> 
>   .ascii  "HdrS"  # header signature
>-  .word   0x020f  # header version number (>= 0x0105)
>+  .word   0x0210  # header version number (>= 0x0105)
>   # or else old loadlin-1.5 will fail)
>   .globl realmode_swtch
> realmode_swtch:   .word   0, 0# default_switch, SETUPSEG
>diff --git a/arch/x86/include/uapi/asm/bootparam.h 
>b/arch/x86/include/uapi/asm/bootparam.h
>index 01d19fc22346..c614ff0755f2 100644
>--- a/arch/x86/include/uapi/asm/bootparam.h
>+++ b/arch/x86/include/uapi/asm/bootparam.h
>@@ -181,6 +181,19 @@ struct ima_setup_data {
>   __u64 size;
> } __attribute__((packed));
> 
>+/*
>+ * Define a boot_param area for the RNG seed which can be used via the
>+ * setup_data mechanism (so must have a setup_data header) but which
>+ * is embedded in boot_params because qemu has been unable to find
>+ * a safe data space for it.  The value RNG_SEED_LENGTH must not
>+ * change (pad length dependent on it) and must match the value in QEMU
>+ */
>+#define RNG_SEED_LENGTH   32
>+struct random_seed_data {
>+  struct  setup_data s;
>+  __u8data[RNG_SEED_LENGTH];
>+} __attribute__((packed));
>+
> /* The so-called "zeropage" */
> struct boot_params {
>   struct screen_info screen_info; /* 0x000 */
>@@ -228,7 +241,8 @@ struct boot_params {
>   struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 
> */
>   __u8  _pad8[48];/* 0xcd0 */
>   struct edd_info eddbuf[EDDMAXNR];   /* 0xd00 */
>-  __u8  _pad9[276];   /* 0xeec */
>+  struct random_seed_data random_seed;/* 0xeec */
>+  __u8  _pad9[228];   /* 0xf1c */
> } __attribute__((packed));
> 
> /**
>diff --git a/arch/x86/kernel/kexec-bzimage64.c 
>b/arch/x86/kernel/kexec-bzimage64.c
>index 6b58610a1552..fb719682579d 100644
>--- a/arch/x86/kernel/kexec-bzimage64.c
>+++ b/arch/x86/kernel/kexec-bzimage64.c
>@@ -110,13 +110,10 @@ static int setup_e820_entries(struct boot_params *params)
>   return 0;
> }
> 
>-enum { RNG_SEED_LENGTH = 32 };
>-
> static void
>-setup_rng_seed(struct boot_params *params, unsigned long params_load_addr,
>- unsigned int rng_seed_setup_data_offset)
>+setup_rng_seed(struct boot_params *params, unsigned long params_load_addr)
> {
>-  struct setup_data *sd = (void *)params + rng_seed_setup_data_offset;
>+  struct setup_data *sd = >random_seed.s;
>   unsigned long setup_data_phys;
> 
>   if (!rng_is_initialized())
>@@ -125,7 +122,8 @@ setup_rng_seed(struct boot_params *params, unsigned long 
>params_load_addr,
>   sd->type = SETUP_RNG_SEED;
>   sd->len = RNG_SEED_LENGTH;
>   get_random_bytes(sd->data, RNG_SEED_LENGTH);
>-  setup_data_phys = params_load_addr + rng_seed_setup_data_offset;
>+  setup_data_phys = params_load_addr + offsetof(struct boot_params,
>+random_seed);
>   sd->next = params->hdr.setup_data;
>   params->hdr.setup_data = setup_data_phys;
> }
>@@ -306,7 +304,7 @@ setup_boot_parameters(struct kimage *image, struct 
>boot_params *params,
>   }
> 
>   /* Setup RNG seed */
>-  setup_rng_seed(params, params_load_addr, setup_data_offset);
>+  setup_rng_seed(params, params_load_addr);
> 
>   /* Setup EDD info */
>   memcpy(params->eddbuf, boot_params.eddbuf,
>

NAK. We need to fix the actual problem of the kernel stomping on memory it 
shouldn't, not paper around it.




Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data

2023-01-31 Thread H. Peter Anvin
On January 31, 2023 1:22:43 PM PST, "Jason A. Donenfeld"  
wrote:
>On Tue, Jan 31, 2023, 15:55 H. Peter Anvin  wrote:
>
>> On January 30, 2023 12:19:14 PM PST, "Michael S. Tsirkin" 
>> wrote:
>> >From: "Jason A. Donenfeld" 
>> >
>> >The setup_data links are appended to the compressed kernel image. Since
>> >the kernel image is typically loaded at 0x10, setup_data lives at
>> >`0x10 + compressed_size`, which does not get relocated during the
>> >kernel's boot process.
>> >
>> >The kernel typically decompresses the image starting at address
>> >0x100 (note: there's one more zero there than the compressed image
>> >above). This usually is fine for most kernels.
>> >
>> >However, if the compressed image is actually quite large, then
>> >setup_data will live at a `0x10 + compressed_size` that extends into
>> >the decompressed zone at 0x100. In other words, if compressed_size
>> >is larger than `0x100 - 0x10`, then the decompression step will
>> >clobber setup_data, resulting in crashes.
>> >
>> >Visually, what happens now is that QEMU appends setup_data to the kernel
>> >image:
>> >
>> >  kernel imagesetup_data
>> >   |--|||
>> >0x10  0x10+l1 0x10+l1+l2
>> >
>> >The problem is that this decompresses to 0x100 (one more zero). So
>> >if l1 is > (0x100-0x10), then this winds up looking like:
>> >
>> >  kernel imagesetup_data
>> >   |--|||
>> >0x10  0x10+l1 0x10+l1+l2
>> >
>> > d e c o m p r e s s e d   k e r n e l
>> >
>>  |-|
>> >0x100
>>  0x100+l3
>> >
>> >The decompressed kernel seemingly overwriting the compressed kernel
>> >image isn't a problem, because that gets relocated to a higher address
>> >early on in the boot process, at the end of startup_64. setup_data,
>> >however, stays in the same place, since those links are self referential
>> >and nothing fixes them up.  So the decompressed kernel clobbers it.
>> >
>> >Fix this by appending setup_data to the cmdline blob rather than the
>> >kernel image blob, which remains at a lower address that won't get
>> >clobbered.
>> >
>> >This could have been done by overwriting the initrd blob instead, but
>> >that poses big difficulties, such as no longer being able to use memory
>> >mapped files for initrd, hurting performance, and, more importantly, the
>> >initrd address calculation is hard coded in qboot, and it always grows
>> >down rather than up, which means lots of brittle semantics would have to
>> >be changed around, incurring more complexity. In contrast, using cmdline
>> >is simple and doesn't interfere with anything.
>> >
>> >The microvm machine has a gross hack where it fiddles with fw_cfg data
>> >after the fact. So this hack is updated to account for this appending,
>> >by reserving some bytes.
>> >
>> >Fixup-by: Michael S. Tsirkin 
>> >Cc: x...@kernel.org
>> >Cc: Philippe Mathieu-Daudé 
>> >Cc: H. Peter Anvin 
>> >Cc: Borislav Petkov 
>> >Cc: Eric Biggers 
>> >Signed-off-by: Jason A. Donenfeld 
>> >Message-Id: <20221230220725.618763-1-ja...@zx2c4.com>
>> >Message-ID: <20230128061015-mutt-send-email-...@kernel.org>
>> >Reviewed-by: Michael S. Tsirkin 
>> >Signed-off-by: Michael S. Tsirkin 
>> >Tested-by: Eric Biggers 
>> >Tested-by: Mathias Krause 
>> >---
>> > include/hw/i386/microvm.h |  5 ++--
>> > include/hw/nvram/fw_cfg.h |  9 +++
>> > hw/i386/microvm.c | 15 +++
>> > hw/i386/x86.c | 52 +--
>> > hw/nvram/fw_cfg.c |  9 +++
>> > 5 files changed, 59 insertions(+), 31 deletions(-)
>> >
>> >diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>> >index fad97a891d..e8af61f194 100644
>> >--- a/include/hw/i386/microvm.h
>> >+++ b/include/hw/i386/microvm.h
>> >@@ -50,8 +50,9 @@
>> >  */
>> >
>> > /* Platform virtio definitions */
>> >-#define VIRTIO_MMIO_BASE  0xfeb0

Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data

2023-01-31 Thread H. Peter Anvin
On January 30, 2023 12:19:14 PM PST, "Michael S. Tsirkin"  
wrote:
>From: "Jason A. Donenfeld" 
>
>The setup_data links are appended to the compressed kernel image. Since
>the kernel image is typically loaded at 0x10, setup_data lives at
>`0x10 + compressed_size`, which does not get relocated during the
>kernel's boot process.
>
>The kernel typically decompresses the image starting at address
>0x100 (note: there's one more zero there than the compressed image
>above). This usually is fine for most kernels.
>
>However, if the compressed image is actually quite large, then
>setup_data will live at a `0x10 + compressed_size` that extends into
>the decompressed zone at 0x100. In other words, if compressed_size
>is larger than `0x100 - 0x10`, then the decompression step will
>clobber setup_data, resulting in crashes.
>
>Visually, what happens now is that QEMU appends setup_data to the kernel
>image:
>
>  kernel imagesetup_data
>   |--|||
>0x10  0x10+l1 0x10+l1+l2
>
>The problem is that this decompresses to 0x100 (one more zero). So
>if l1 is > (0x100-0x10), then this winds up looking like:
>
>  kernel imagesetup_data
>   |--|||
>0x10  0x10+l1 0x10+l1+l2
>
> d e c o m p r e s s e d   k e r n e l
> 
> |-|
>0x100 
> 0x100+l3
>
>The decompressed kernel seemingly overwriting the compressed kernel
>image isn't a problem, because that gets relocated to a higher address
>early on in the boot process, at the end of startup_64. setup_data,
>however, stays in the same place, since those links are self referential
>and nothing fixes them up.  So the decompressed kernel clobbers it.
>
>Fix this by appending setup_data to the cmdline blob rather than the
>kernel image blob, which remains at a lower address that won't get
>clobbered.
>
>This could have been done by overwriting the initrd blob instead, but
>that poses big difficulties, such as no longer being able to use memory
>mapped files for initrd, hurting performance, and, more importantly, the
>initrd address calculation is hard coded in qboot, and it always grows
>down rather than up, which means lots of brittle semantics would have to
>be changed around, incurring more complexity. In contrast, using cmdline
>is simple and doesn't interfere with anything.
>
>The microvm machine has a gross hack where it fiddles with fw_cfg data
>after the fact. So this hack is updated to account for this appending,
>by reserving some bytes.
>
>Fixup-by: Michael S. Tsirkin 
>Cc: x...@kernel.org
>Cc: Philippe Mathieu-Daudé 
>Cc: H. Peter Anvin 
>Cc: Borislav Petkov 
>Cc: Eric Biggers 
>Signed-off-by: Jason A. Donenfeld 
>Message-Id: <20221230220725.618763-1-ja...@zx2c4.com>
>Message-ID: <20230128061015-mutt-send-email-...@kernel.org>
>Reviewed-by: Michael S. Tsirkin 
>Signed-off-by: Michael S. Tsirkin 
>Tested-by: Eric Biggers 
>Tested-by: Mathias Krause 
>---
> include/hw/i386/microvm.h |  5 ++--
> include/hw/nvram/fw_cfg.h |  9 +++
> hw/i386/microvm.c | 15 +++
> hw/i386/x86.c | 52 +--
> hw/nvram/fw_cfg.c |  9 +++
> 5 files changed, 59 insertions(+), 31 deletions(-)
>
>diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>index fad97a891d..e8af61f194 100644
>--- a/include/hw/i386/microvm.h
>+++ b/include/hw/i386/microvm.h
>@@ -50,8 +50,9 @@
>  */
> 
> /* Platform virtio definitions */
>-#define VIRTIO_MMIO_BASE  0xfeb0
>-#define VIRTIO_CMDLINE_MAXLEN 64
>+#define VIRTIO_MMIO_BASE0xfeb0
>+#define VIRTIO_CMDLINE_MAXLEN   64
>+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN((VIRTIO_CMDLINE_MAXLEN + 1) * 16)
> 
> #define GED_MMIO_BASE 0xfea0
> #define GED_MMIO_BASE_MEMHP   (GED_MMIO_BASE + 0x100)
>diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
>index 2e503904dc..990dcdbb2e 100644
>--- a/include/hw/nvram/fw_cfg.h
>+++ b/include/hw/nvram/fw_cfg.h
>@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t 
>key,
>void *data, size_t len,
>bool read_only);
> 
>+/**
>+ * fw_cfg_read_bytes_ptr:
>+ * @s: fw_cfg device being modified
>+ * @key: selector key value for new fw_cfg item
>+ *
>+ * Reads 

Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-31 Thread H. Peter Anvin

On 12/31/22 20:55, Mika Penttilä wrote:


If decompression does clobber the data, then we *also* need to figure 
out why that is. There are basically three possibilities:


1. If physical KASLR is NOT used:

 a. The boot loader doesn't honor the kernel safe area properly;
 b. Somewhere in the process a bug in the calculation of the
    kernel safe area has crept in.

2. If physical KASLR IS used:

 The decompressor doesn't correctly keep track of nor relocate
 all the keep-out zones before picking a target address.


Seems setup_data is not included in those mem_avoid regions.



[facepalm]




One is a bootloader bug, two is a kernel bugs. My guess is (2) is the 
culprit, but (1b) should be checked, too.




Correction: two are kernel bugs, i.e. (1b) and (2) are both kernel bugs.

-hpa



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-31 Thread H. Peter Anvin




On 12/31/22 10:22, Jason A. Donenfeld wrote:

On Sat, Dec 31, 2022 at 03:24:32PM +0100, Borislav Petkov wrote:

On Sat, Dec 31, 2022 at 02:51:28PM +0100, Jason A. Donenfeld wrote:

That failure is unrelated to the ident mapping issue Peter and
I discussed. The original failure is described in the commit message:
decompression clobbers the data, so sd->next points to garbage.


Right


So with that understanding confirmed, I'm confused at your surprise that
hpa's unrelated fix to the different issue didn't fix this issue.



If decompression does clobber the data, then we *also* need to figure 
out why that is. There are basically three possibilities:


1. If physical KASLR is NOT used:

a. The boot loader doesn't honor the kernel safe area properly;
b. Somewhere in the process a bug in the calculation of the
   kernel safe area has crept in.

2. If physical KASLR IS used:

The decompressor doesn't correctly keep track of nor relocate
all the keep-out zones before picking a target address.

One is a bootloader bug, two is a kernel bug. My guess is (2) is the 
culprit, but (1b) should be checked, too.


-hpa




Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-31 Thread H. Peter Anvin

On 12/31/22 19:21, H. Peter Anvin wrote:



Alternatively, setup_data could be relocated, the boot param protocol
could be bumped, and then QEMU could conditionalized it's use of
setup_data based on that protocol version. That'd work, but seems a bit
more involved.


I think this is at least worth considering because the kernel overwriting
setup_data after having been told where that setup_data is located is 
not really

nice.

Well not explicitly at least - it gets the pointer to the first 
element and
something needs to traverse that list to know which addresses are 
live. But

still, that info is there so perhaps we need to take setup_data into
consideration too before decompressing...



It would probably be a good idea to add a "maximum physical address for 
initrd/setup_data/cmdline" field to struct kernel_info, though. It 
appears right now that those fields are being identity-mapped in the 
decompressor, and that means that if 48-bit addressing is used, physical 
memory may extend past the addressable range.


-hpa




Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-31 Thread H. Peter Anvin




On 12/31/22 11:00, Borislav Petkov wrote:

On Sat, Dec 31, 2022 at 07:22:47PM +0100, Jason A. Donenfeld wrote:

So with that understanding confirmed, I'm confused at your surprise that
hpa's unrelated fix to the different issue didn't fix this issue.


No surprise there - I used a qemu variant without your patch to prevent the
setup_data clobbering so hpa's fix can't help there.


But since the kernel doesn't do this now, and the 62MiB bug also seems
to apply to existing kernels, for the purposes of QEMU for now, I think
the v3 patch is probably best, since it'll handle existing kernels.


Right, we can't fix all those guests which are out there.


Alternatively, setup_data could be relocated, the boot param protocol
could be bumped, and then QEMU could conditionalized it's use of
setup_data based on that protocol version. That'd work, but seems a bit
more involved.


I think this is at least worth considering because the kernel overwriting
setup_data after having been told where that setup_data is located is not really
nice.

Well not explicitly at least - it gets the pointer to the first element and
something needs to traverse that list to know which addresses are live. But
still, that info is there so perhaps we need to take setup_data into
consideration too before decompressing...



As far as the decompression itself goes, it should only a problem if we 
are using physical KASLR since otherwise the kernel has a guaranteed 
safe zone already allocated by the boot loader. However, if physical 
KASLR is in use, then the decompressor needs to know everything there is 
to know about the memory map.


However, there also seems to be some kind of interaction with AMD SEV-SNP.


The bug appears to have been introduced by:

b57feed2cc2622ae14b2fa62f19e973e5e0a60cf
x86/compressed/64: Add identity mappings for setup_data entries
https://lore.kernel.org/r/tycpr01mb694815cd815e98945f63c99183...@tycpr01mb6948.jpnprd01.prod.outlook.com

... which was included in version 5.19, so it is relatively recent.

For a small amount of setup_data, the solution of just putting it next 
to the command line makes a lot of sense, and should be safe indefinitely.


-hpa



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread H. Peter Anvin

On 12/30/22 17:06, H. Peter Anvin wrote


TThe 62 MB limit mentioned in boot.rst is unrelated, and only applies to 
very, very old kernels that used INT 15h, AH=88h to probe memory.




I am 88% sure this was fixed long before setup_data was created, as it 
was created originally to carry e820 info for more than 128(!) memory 
segments. However, as we see here, it is never certain that bugs didn't 
creep in in the meantime...


-hpa




Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread H. Peter Anvin




On 12/30/22 14:10, Jason A. Donenfeld wrote:

On Fri, Dec 30, 2022 at 01:58:39PM -0800, H. Peter Anvin wrote:

See the other thread fork. They have identified the problem already.


Not sure I follow. Is there another thread where somebody worked out why
this 62meg limit was happening?

Note that I sent v2/v3, to fix the original problem in a different way,
and if that looks good to the QEMU maintainers, then we can all be happy
with that. But I *haven't* addressed and still don't fully understand
why the 62meg limit applied to my v1 in the way it does. Did you find a
bug there to fix? If so, please do CC me.



Yes, you yourself posted the problem:


Then build qemu. Run it with `-kernel bzImage`, based on the kernel
built with the .config I attached.

You'll see that the CPU triple faults when hitting this line:

sd = (struct setup_data *)boot_params->hdr.setup_data;
while (sd) {
unsigned long sd_addr = (unsigned long)sd;

kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len); 
 <
sd = (struct setup_data *)sd->next;
}

, because it dereferences *sd. This does not happen if the decompressed
size of the kernel is < 62 megs.

So that's the "big and pretty serious" bug that might be worthy of
investigation.


This needs to be something like:

kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd));
kernel_add_identity_map(sd_addr + sizeof(*sd),
sd_addr + sizeof(*sd) + sd->len);


TThe 62 MB limit mentioned in boot.rst is unrelated, and only applies to 
very, very old kernels that used INT 15h, AH=88h to probe memory.


-hpa



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread H. Peter Anvin
On December 30, 2022 11:54:11 AM PST, Borislav Petkov  wrote:
>On Fri, Dec 30, 2022 at 06:07:24PM +0100, Jason A. Donenfeld wrote:
>> Look closer at the boot process. The compressed image is initially at
>> 0x10, but it gets relocated to a safer area at the end of
>> startup_64:
>
>That is the address we're executing here from, rip here looks like 0x100xxx.
>
>> /*
>>  * Copy the compressed kernel to the end of our buffer
>>  * where decompression in place becomes safe.
>>  */
>> pushq   %rsi
>> leaq(_bss-8)(%rip), %rsi
>> leaqrva(_bss-8)(%rbx), %rdi
>
>when you get to here, it looks something like this:
>
>leaq(_bss-8)(%rip), %rsi   # 0x9e7ff8
>leaqrva(_bss-8)(%rbx), %rdi# 0xc6eeff8
>
>so the source address is that _bss thing and we copy...
>
>> movl$(_bss - startup_32), %ecx
>> shrl$3, %ecx
>> std
>
>... backwards since DF=1.
>
>Up to:
>
># rsi = 0x8
># rdi = 0xbe06ff8
>
>Ok, so the source address is 0x10. Good.
>
>> HOWEVER, qemu currently appends setup_data to the end of the
>> compressed kernel image,
>
>Yeah, you mean the kernel which starts executing at 0x10, i.e., that part
>which is compressed/head_64.S and which does the above and the relocation etc.
>
>> and this part isn't moved, and setup_data links aren't walked/relocated. So
>> that means the original address remains, of 0x10.
>
>See above: when it starts copying the kernel image backwards to a higher
>address, that last byte is at 0x9e7ff8 so I'm guessing qemu has put setup_data
>*after* that address. And that doesn't get copied ofc.
>
>So far, so good.
>
>Now later, we extract the compressed kernel created with the mkpiggy magic:
>
>input_data:
>.incbin "arch/x86/boot/compressed/vmlinux.bin.gz"
>input_data_end:
>
>by doing
>
>/*
> * Do the extraction, and jump to the new kernel..
> */
>
>pushq   %rsi/* Save the real mode argument */  
> 0x13d00
>movq%rsi, %rdi  /* real mode address */
> 0x13d00
>leaqboot_heap(%rip), %rsi   /* malloc area for uncompression */
> 0xc6ef000
>leaqinput_data(%rip), %rdx  /* input_data */   
> 0xbe073a8
>movlinput_len(%rip), %ecx   /* input_len */
> 0x8cfe13
>movq%rbp, %r8   /* output target address */
> 0x100
>movloutput_len(%rip), %r9d  /* decompressed length, end of relocs 
> */
>callextract_kernel  /* returns kernel location in %rax */
>popq%rsi
>
>(actual addresses at the end.)
>
>Now, when you say you triplefault somewhere in initialize_identity_maps() when
>trying to access setup_data, then if you look a couple of lines before that 
>call
>we do
>
>   call load_stage2_idt
>
>which sets up a boottime #PF handler do_boot_page_fault() and it actually does
>call kernel_add_identity_map() so *actually* it should map any unmapped
>setup_data addresses.
>
>So why doesn't it do that and why do you triplefault?
>
>Hmmm.
>

See the other thread fork. They have identified the problem already.



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-30 Thread H. Peter Anvin
On December 30, 2022 7:59:30 AM PST, "Jason A. Donenfeld"  
wrote:
>Hi,
>
>On Wed, Dec 28, 2022 at 11:31:34PM -0800, H. Peter Anvin wrote:
>> On December 28, 2022 6:31:07 PM PST, "Jason A. Donenfeld"  
>> wrote:
>> >Hi,
>> >
>> >Read this message in a fixed width text editor with a lot of columns.
>> >
>> >On Wed, Dec 28, 2022 at 03:58:12PM -0800, H. Peter Anvin wrote:
>> >> Glad you asked.
>> >> 
>> >> So the kernel load addresses are parameterized in the kernel image
>> >> setup header. One of the things that are so parameterized are the size
>> >> and possible realignment of the kernel image in memory.
>> >> 
>> >> I'm very confused where you are getting the 64 MB number from. There
>> >> should not be any such limitation.
>> >
>> >Currently, QEMU appends it to the kernel image, not to the initramfs as
>> >you suggest below. So, that winds up looking, currently, like:
>> >
>> >  kernel imagesetup_data
>> >   |--|||
>> >0x10  0x10+l1 0x10+l1+l2
>> >
>> >The problem is that this decompresses to 0x100 (one more zero). So
>> >if l1 is > (0x100-0x10), then this winds up looking like:
>> >
>> >  kernel imagesetup_data
>> >   |--|||
>> >0x10  0x10+l1 0x10+l1+l2
>> >
>> > d e c o m p r e s s e d   k e r n e l
>> > 
>> > |-|
>> >0x100   
>> >   0x100+l3 
>> >
>> >The decompressed kernel seemingly overwriting the compressed kernel
>> >image isn't a problem, because that gets relocated to a higher address
>> >early on in the boot process. setup_data, however, stays in the same
>> >place, since those links are self referential and nothing fixes them up.
>> >So the decompressed kernel clobbers it.
>> >
>> >The solution in this commit adds a bunch of padding between the kernel
>> >image and setup_data to avoid this. That looks like this:
>> >
>> >  kernel imagepadding   
>> > setup_data
>> >   
>> > |--||---|||
>> >0x10  0x10+l1   
>> >  0x100+l3  0x100+l3+l2
>> >
>> > d e c o m p r e s s e d   k e r n e l
>> > 
>> > |-|
>> >0x100   
>> >   0x100+l3 
>> >
>> >This way, the decompressed kernel doesn't clobber setup_data.
>> >
>> >The problem is that if 0x100+l3-0x10 is around 62 megabytes,
>> >then the bootloader crashes when trying to dereference setup_data's
>> >->len param at the end of initialize_identity_maps() in ident_map_64.c.
>> >I don't know why it does this. If I could remove the 62 megabyte
>> >restriction, then I could keep with this technique and all would be
>> >well.
>> >
>> >> In general, setup_data should be able to go anywhere the initrd can
>> >> go, and so is subject to the same address cap (896 MB for old kernels,
>> >> 4 GB on newer ones; this address too is enumerated in the header.)
>> >
>> >It would be theoretically possible to attach it to the initrd image
>> >instead of to the kernel image. As a last resort, I guess I can look
>> >into doing that. However, that's going to require some serious rework
>> >and plumbing of a lot of different components. So if I can make it work
>> >as is, that'd be ideal. However, I need to figure out this weird 62 meg
>> >limitation.
>> >
>> >Any ideas on that?
>> >
>> >Jason
>> 
>> As far as a crash... that sounds like a big and a pretty serious one at that.
>> 
>> Could you let me know what kernel you are using and how *exactly* you are 
>> booting it?
>
>I'll attach a .config file. Apply the patch at the top of this thread to
>qemu, except make one modificat

Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-28 Thread H. Peter Anvin
On December 28, 2022 6:31:07 PM PST, "Jason A. Donenfeld"  
wrote:
>Hi,
>
>Read this message in a fixed width text editor with a lot of columns.
>
>On Wed, Dec 28, 2022 at 03:58:12PM -0800, H. Peter Anvin wrote:
>> Glad you asked.
>> 
>> So the kernel load addresses are parameterized in the kernel image
>> setup header. One of the things that are so parameterized are the size
>> and possible realignment of the kernel image in memory.
>> 
>> I'm very confused where you are getting the 64 MB number from. There
>> should not be any such limitation.
>
>Currently, QEMU appends it to the kernel image, not to the initramfs as
>you suggest below. So, that winds up looking, currently, like:
>
>  kernel imagesetup_data
>   |--|||
>0x10  0x10+l1 0x10+l1+l2
>
>The problem is that this decompresses to 0x100 (one more zero). So
>if l1 is > (0x100-0x10), then this winds up looking like:
>
>  kernel imagesetup_data
>   |--|||
>0x10  0x10+l1 0x10+l1+l2
>
> d e c o m p r e s s e d   k e r n e l
>
> |-|
>0x100 
> 0x100+l3 
>
>The decompressed kernel seemingly overwriting the compressed kernel
>image isn't a problem, because that gets relocated to a higher address
>early on in the boot process. setup_data, however, stays in the same
>place, since those links are self referential and nothing fixes them up.
>So the decompressed kernel clobbers it.
>
>The solution in this commit adds a bunch of padding between the kernel
>image and setup_data to avoid this. That looks like this:
>
>  kernel imagepadding  
>  setup_data
>   
> |--||---|||
>0x10  0x10+l1 
>0x100+l3  0x100+l3+l2
>
> d e c o m p r e s s e d   k e r n e l
>
> |-|
>0x100 
> 0x100+l3 
>
>This way, the decompressed kernel doesn't clobber setup_data.
>
>The problem is that if 0x100+l3-0x10 is around 62 megabytes,
>then the bootloader crashes when trying to dereference setup_data's
>->len param at the end of initialize_identity_maps() in ident_map_64.c.
>I don't know why it does this. If I could remove the 62 megabyte
>restriction, then I could keep with this technique and all would be
>well.
>
>> In general, setup_data should be able to go anywhere the initrd can
>> go, and so is subject to the same address cap (896 MB for old kernels,
>> 4 GB on newer ones; this address too is enumerated in the header.)
>
>It would be theoretically possible to attach it to the initrd image
>instead of to the kernel image. As a last resort, I guess I can look
>into doing that. However, that's going to require some serious rework
>and plumbing of a lot of different components. So if I can make it work
>as is, that'd be ideal. However, I need to figure out this weird 62 meg
>limitation.
>
>Any ideas on that?
>
>Jason

As far as a crash... that sounds like a big and a pretty serious one at that.

Could you let me know what kernel you are using and how *exactly* you are 
booting it?



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-28 Thread H. Peter Anvin
On December 28, 2022 6:31:07 PM PST, "Jason A. Donenfeld"  
wrote:
>Hi,
>
>Read this message in a fixed width text editor with a lot of columns.
>
>On Wed, Dec 28, 2022 at 03:58:12PM -0800, H. Peter Anvin wrote:
>> Glad you asked.
>> 
>> So the kernel load addresses are parameterized in the kernel image
>> setup header. One of the things that are so parameterized are the size
>> and possible realignment of the kernel image in memory.
>> 
>> I'm very confused where you are getting the 64 MB number from. There
>> should not be any such limitation.
>
>Currently, QEMU appends it to the kernel image, not to the initramfs as
>you suggest below. So, that winds up looking, currently, like:
>
>  kernel imagesetup_data
>   |--|||
>0x10  0x10+l1 0x10+l1+l2
>
>The problem is that this decompresses to 0x100 (one more zero). So
>if l1 is > (0x100-0x10), then this winds up looking like:
>
>  kernel imagesetup_data
>   |--|||
>0x10  0x10+l1 0x10+l1+l2
>
> d e c o m p r e s s e d   k e r n e l
>
> |-|
>0x100 
> 0x100+l3 
>
>The decompressed kernel seemingly overwriting the compressed kernel
>image isn't a problem, because that gets relocated to a higher address
>early on in the boot process. setup_data, however, stays in the same
>place, since those links are self referential and nothing fixes them up.
>So the decompressed kernel clobbers it.
>
>The solution in this commit adds a bunch of padding between the kernel
>image and setup_data to avoid this. That looks like this:
>
>  kernel imagepadding  
>  setup_data
>   
> |--||---|||
>0x10  0x10+l1 
>0x100+l3  0x100+l3+l2
>
> d e c o m p r e s s e d   k e r n e l
>
> |-|
>0x100 
> 0x100+l3 
>
>This way, the decompressed kernel doesn't clobber setup_data.
>
>The problem is that if 0x100+l3-0x10 is around 62 megabytes,
>then the bootloader crashes when trying to dereference setup_data's
>->len param at the end of initialize_identity_maps() in ident_map_64.c.
>I don't know why it does this. If I could remove the 62 megabyte
>restriction, then I could keep with this technique and all would be
>well.
>
>> In general, setup_data should be able to go anywhere the initrd can
>> go, and so is subject to the same address cap (896 MB for old kernels,
>> 4 GB on newer ones; this address too is enumerated in the header.)
>
>It would be theoretically possible to attach it to the initrd image
>instead of to the kernel image. As a last resort, I guess I can look
>into doing that. However, that's going to require some serious rework
>and plumbing of a lot of different components. So if I can make it work
>as is, that'd be ideal. However, I need to figure out this weird 62 meg
>limitation.
>
>Any ideas on that?
>
>Jason

Ok, the code I sent will figure out the minimum amount of padding that you need 
(min_initrd_addr) as well.



Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-28 Thread H. Peter Anvin

On 12/28/22 15:58, H. Peter Anvin wrote:

On December 28, 2022 8:57:54 AM PST, "Jason A. Donenfeld"  
wrote:

HELLO H. PETER ANVIN,
E
L
L
O

On Wed, Dec 28, 2022 at 05:30:30PM +0100, Jason A. Donenfeld wrote:

Fix looks good, glad you figured out the problem.


I mean, kind of. The solution here sucks, especially given that in the
worst case, setup_data just gets dropped. I'm half inclined to consider
this a kernel bug instead, and add some code to relocate setup_data
prior to decompression, and then fix up all the links. It seems like
this would be a lot more robust.

I just wish the people who wrote this stuff would chime in. I've had
x...@kernel.org CC'd but so far, no input from them.


Apparently you are the x86 boot guru. What do you want to happen here?
Your input would be very instrumental.

Jason


Hi!

Glad you asked.

So the kernel load addresses are parameterized in the kernel image
setup header. One of the things that are so parameterized are the
size and possible realignment of the kernel image in memory.

I'm very confused where you are getting the 64 MB number from. There
should not be any such limitation.

In general, setup_data should be able to go anywhere the initrd can
go, and so is subject to the same address cap (896 MB for old
kernels, 4 GB on newer ones; this address too is enumerated in the
header.)

If you want to put setup_data above 4 GB, it *should* be ok if and
only if the kernel supports loading the initrd high, too (again,
enumerated in the header.

TL;DR: put setup_data where you put the initrd (before or after
doesn't matter.)

To be maximally conservative, link the setup_data list in order from
lowest to highest address; currently there is no such item of
relevance, but in the future there may be setup_data items needed by
the BIOS part of the bootstrap in which case they would have to be <
1 MB and precede any items > 1 MB for obvious reasons. That being
said, with BIOS dying it is not all that likely that such entries
will ever be needed.



So let me try for an algorithm. Attached as a text file to avoid line 
break damage.


-hpaHere is an attempted description with pseudo-C code:

First of all, take a 4K page of memory and *initialize it to zero*.
{
#include   /* From the uapi kernel sources */

/* Allocated somewhere in your code... */
extern unsigned char *kernel_image; /* Kernel file */
extern struct boot_params *boot_params; /* 4K buffer */
extern uint32_t kernel_image_size;  /* Size of kernel file */

/* Callbacks into your code */
extern bool is_bios_boot(void);
extern uint32_t end_of_low_memory(void); /* For BIOS boot */
/*
 * This MUST return an alignment address between start_address
 * and max_address...
 */
extern uint64_t maybe_relocate_kernel(uint64_t start_address,
  uint64_t max_address, uint32_t alignment);

/*
 * Convenience pointer into the kernel image; modifications
 * done here should be reflected in the loaded kernel image
 */
struct setup_header * const kernel_setup_header =
(struct setup_header *)(kernel_image + 0x1f1);

/* Initialize boot_params to zero!!! */
memset(boot_params, 0, sizeof *boot_params);
}

Copy the setup header starting at file offset 0x1f1 to offset 0x1f1
into that page:
{
int setup_length =
kernel_setup_header->header == 0x53726448
? (kernel_setup_header->jump >> 8) + 17 : 15;

memcpy(_params->hdr, kernel_setup_header, setup_length);
}

Now you can compute values including ones are omitted by older kernels:
{
/*
 * Split between the part of the kernel to be loaded into
 * low memory (for 16-bit boot, otherwise it can be safely
 * omitted) and the part to be loaded into high memory.
 */
if (!boot_params->hdr.setup_sects)
boot_param->hdr.setup_sects = 4;

int high_kernel_start = (boot_param->hdr.setup_sects+1) << 9;

/*
 * Highest permitted address for the high part of the kernel image,
 * initrd, command line (*except for 16-bit boot*), and setup_data
 *
 * max_initrd_addr here is exclusive
 */
uint64_t max_initrd_addr = (uint64_t)boot_params->hdr.initrd_addr_max + 1;
if (boot_params->hdr.version < 0x0200)
max_initrd_addr = 0;/* No initrd supported */
else if (boot_params->hdr.version < 0x0203)
max_initrd_addr = 0x3800;
else if (boot_params->hdr.version >= 0x020c &&
   (boot_params->hdr.xloadflags & XLF_CAN_BE_LOADED_ABOVE_4G))
max_initrd_addr = (uint64_t)1 << 52; /* Architecture-imposed limit */

/*
 * Maximum command line size *including terminating null*
 */
unsigned int cmdline_size;
if (boot_params->hdr.version < 0x0200)
cmdline_size = 0;   /* No command line supported */
else if (boot_params->hdr.version < 0x0

Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

2022-12-28 Thread H. Peter Anvin
On December 28, 2022 8:57:54 AM PST, "Jason A. Donenfeld"  
wrote:
>HELLO H. PETER ANVIN,
>E
>L
>L
>O
>
>On Wed, Dec 28, 2022 at 05:30:30PM +0100, Jason A. Donenfeld wrote:
>> > Fix looks good, glad you figured out the problem.
>> 
>> I mean, kind of. The solution here sucks, especially given that in the
>> worst case, setup_data just gets dropped. I'm half inclined to consider
>> this a kernel bug instead, and add some code to relocate setup_data
>> prior to decompression, and then fix up all the links. It seems like
>> this would be a lot more robust.
>> 
>> I just wish the people who wrote this stuff would chime in. I've had
>> x...@kernel.org CC'd but so far, no input from them.
>
>Apparently you are the x86 boot guru. What do you want to happen here?
>Your input would be very instrumental.
>
>Jason

Hi!

Glad you asked.

So the kernel load addresses are parameterized in the kernel image setup 
header. One of the things that are so parameterized are the size and possible 
realignment of the kernel image in memory.

I'm very confused where you are getting the 64 MB number from. There should not 
be any such limitation.

In general, setup_data should be able to go anywhere the initrd can go, and so 
is subject to the same address cap (896 MB for old kernels, 4 GB on newer ones; 
this address too is enumerated in the header.)

If you want to put setup_data above 4 GB, it *should* be ok if and only if the 
kernel supports loading the initrd high, too (again, enumerated in the header.

TL;DR: put setup_data where you put the initrd (before or after doesn't matter.)

To be maximally conservative, link the setup_data list in order from lowest to 
highest address; currently there is no such item of relevance, but in the 
future there may be setup_data items needed by the BIOS part of the bootstrap 
in which case they would have to be < 1 MB and precede any items > 1 MB for 
obvious reasons. That being said, with BIOS dying it is not all that likely 
that such entries will ever be needed.




Re: [Qemu-devel] Qemu baseline requirements/portability?

2019-06-05 Thread H. Peter Anvin
On 6/5/19 12:55 PM, H. Peter Anvin wrote:
> Hi,
> 
> I am writing some code I'm hoping will be able to make it into Qemu, but I
> can't seem to find what the baseline portability requirements are.  I'm
> specifically wondering about newer POSIX features like openat(), which seems
> to be used in the 9p filesystem and nowhere else, and what version of glib one
> can rely on?
> 

Specifically, I'm trying to satisfy a 10-year-old request by me and others to
support composite initrd during Linux boot.

-hpa




[Qemu-devel] Qemu baseline requirements/portability?

2019-06-05 Thread H. Peter Anvin
Hi,

I am writing some code I'm hoping will be able to make it into Qemu, but I
can't seem to find what the baseline portability requirements are.  I'm
specifically wondering about newer POSIX features like openat(), which seems
to be used in the 9p filesystem and nowhere else, and what version of glib one
can rely on?

Thanks,

-hpa



Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd

2018-11-12 Thread H. Peter Anvin
On 11/11/18 10:19 PM, Ingo Molnar wrote:
> 
>> In part as a result of this exchange I have spent some time thinking 
>> about the boot protocol and its dependencies, and there is, in fact, a 
>> much more serious problem that needs to be addressed: it is not 
>> currently possible in a forward-compatible way to map all data areas 
>> that may be occupied by bootloader-provided data. The kernel proper has 
>> an advantage here, in that the kernel will by definition always be the 
>> "owner of the protocol" (anything the kernel doesn't know how to map 
>> won't be used by the kernel anyway), but it really isn't a good 
>> situation. So I'm currently trying to think up a way to make that 
>> possible.
> 
> I might be a bit dense early in the morning, but could you elaborate? 
> What do you mean by mapping all data areas?

Alright, awake now...

As it sits right now, the protocol contains a number of data structures with
pointers, pointing to a variety of memory areas that can be set up by the
bootloader. Now, consider something like KASLR or a secondary boot loader
where we need to allocate memory in between the primary bootloader and the
kernel to be run. With the kernel proper, in the absence of KASLR, we have
solved this by marking out exactly how much memory the kernel may need before
it has its own memory manager up and running, but KASLR needs to move it
outside this range, and a secondary boot loader shim of some sort may need to
allocate additional data structures. In the particular case of an UEFI system
where we do the right thing (which Grub2 doesn't, by default) and enter via
the kernel UEFI stub we are okay, but for other boot scenarios we are in
trouble: even if we know where all the pointers are and how to determine the
size of various data structures, once the protocol is updated with new
information then that is no longer valid.

The setup_data linked list solves that under certain circumstances, but in
others it has turned out to not be adequate.

There are a couple of options:

a) Not allow any new pointers to memory areas in what is considered system
   RAM. Such data structures *must* have a setup_data linked list header.
   Pointers into E820 table reserved areas are still acceptable.

b) Create a new E820 table memory type for "boot data", similar to what UEFI
   already has, and encourage boot loaders to mark any allocated memory
   structures that way.  The main problem with that is that the poor quality
   of boot loaders may mean that that fails to happen, and because it wouldn't
   "fail hard" it is likely that they will get it wrong.

   The difference from the RESERVED memory type is that the kernel can reclaim
   that memory after the data has been recovered.

c) This might be the preferred option:

   1. Just like (a), do not allow new pointers to memory areas in system RAM
  in struct boot_params.
   2. Create a subrange of struct setup_data (e.g. bit 30 = 1) explicitly
  containing pointers to other data structures, including sizes, in a
  way that can be parsed by generic code.
   3. Encourage boot loaders to make sure the setup_data list is in order of
  ascending address (and WARN if it is not.)
   4. Add (b) as an option, for responsible boot loaders ;) to provide an
  extra level of protection.



Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd

2018-11-11 Thread H. Peter Anvin
On 11/11/18 10:19 PM, Ingo Molnar wrote:
> 
> I might be a bit dense early in the morning, but could you elaborate? 
> What do you mean by mapping all data areas?
> 

Heh. I need to pack for LPC and get some sleep before my flight lest I'll be
denser than depleted uranium; I'll write an explanation tomorrow.

-hpa




Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd

2018-11-11 Thread H. Peter Anvin
On 11/11/18 8:56 PM, Ingo Molnar wrote:
> 
>> Also note that the ext_ramdisk_image and ext_ramdisk_size are part of
>> struct boot_params as opposed to struct setup_header, which means that
>> they are not supported when entering via the 16-bit BIOS entry point,
>> and I am willing to bet that there will be, ahem, "strangeness" if
>> entered via the 32-bit entry point if at least the command line is
>> loaded above the 4 GB mark; the initrd should be fine, though.
>>
>> This is obviosly not an issue in EFI environments, where we enter
>> through the EFI handover entry point.
>>
>> The main reason these were not added to struct setup_header is that
>> there are only 24 bytes left in that header and so space is highly
>> precious. One way to deal with that if we really, really need to would
>> be to add an initrd/initramfs type of setup_data.
> 
> Is there no way to extend that header by making an extended header part 
> of the payload?
> 
> IIRC that header is small and fixed size to be part of a single sector at 
> the very beginning of boot images, but accessing any extended header bits 
> from the payload section shouldn't really be an issue for a modern 
> bootloader to handle, right?
> 
> Such an extended header could use a more modern (self-extending) ABI as 
> well.
> 

Yes, although I don't really think it is as much of an issue as it seems at
this point.

The limit comes from having used a one-byte jump instruction at the beginning;
however, these days that limit is functionally walled.

It is of course possible to address this if it should become necessary,
however, the current protocol has lasted for 23 years so far and we haven't
run out yet, even with occasional missteps. As such, I don't think we are in a
huge hurry to address this particular aspect.

In part as a result of this exchange I have spent some time thinking about the
boot protocol and its dependencies, and there is, in fact, a much more serious
problem that needs to be addressed: it is not currently possible in a
forward-compatible way to map all data areas that may be occupied by
bootloader-provided data. The kernel proper has an advantage here, in that the
kernel will by definition always be the "owner of the protocol" (anything the
kernel doesn't know how to map won't be used by the kernel anyway), but it
really isn't a good situation. So I'm currently trying to think up a way to
make that possible.

-hpa



Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd

2018-11-09 Thread H. Peter Anvin
On 11/9/18 5:40 AM, Li Zhijian wrote:
> Just noticed that there is a field xloadflags at recent protocol
>   60 Protocol 2.12:  (Kernel 3.8) Added the xloadflags field and
> extension fields
>   61 to struct boot_params for loading bzImage and ramdisk
>   62 above 4G in 64bit.
> [snip]
>  617 Field name: xloadflags
>  618 Type:   read
>  619 Offset/size:    0x236/2
>  620 Protocol:   2.12+
>  621
>  622   This field is a bitmask.
>  623
>  624   Bit 0 (read): XLF_KERNEL_64
>  625 - If 1, this kernel has the legacy 64-bit entry point at
> 0x200.
>  626
>  627   Bit 1 (read): XLF_CAN_BE_LOADED_ABOVE_4G
>  628 - If 1, kernel/boot_params/cmdline/ramdisk can be above 4G.
>  629
> 
> maybe we can reuse this field and append a new Bit 5
> XLF_INITRD_MAX_SIZE_4G and increase header version.
> For the old protocol version 2.12+, if  XLF_CAN_BE_LOADED_ABOVE_4G is
> set, we can also realize ~4GB initrd is allowed.
> 
> bootloader side:
> if protocol >= 2.15
>    if XLF_INITRD_LOAD_BELOW_4G
>   support ~4G initrd
>    fi
> else if protocol >=2.12
>    if XLF_CAN_BE_LOADED_ABOVE_4G
>     support ~4G initrd
>    fi
> fi
> 

The two are equivalent.  Obviously you have to load above 4 GB if you
have more than 4 GB of initrd.  If XLF_CAN_BE_LOADED_ABOVE_4G is not
set, then you most likely are on a 32-bit kernel and there are more
fundamental limits (even if you were to load it above the 2 GB mark, you
would be limited by the size of kernel memory.)

So, in case you are wondering: the bootloader that broke when setting
the initrd_max field above 2 GB was, of course, Grub.

So just use XLF_CAN_BE_LOADED_ABOVE_4G. There is no need for a new flag
or new field.

Also note that the ext_ramdisk_image and ext_ramdisk_size are part of
struct boot_params as opposed to struct setup_header, which means that
they are not supported when entering via the 16-bit BIOS entry point,
and I am willing to bet that there will be, ahem, "strangeness" if
entered via the 32-bit entry point if at least the command line is
loaded above the 4 GB mark; the initrd should be fine, though.

This is obviosly not an issue in EFI environments, where we enter
through the EFI handover entry point.

The main reason these were not added to struct setup_header is that
there are only 24 bytes left in that header and so space is highly
precious. One way to deal with that if we really, really need to would
be to add an initrd/initramfs type of setup_data.

-hpa



Re: [Qemu-devel] [PATCH] slirp: support dynamic block size for TFTP transfers

2016-11-21 Thread H. Peter Anvin
On 11/21/16 11:45, Hervé Poussineau wrote:
> The blocksize option is defined in RFC 1783.
> We now support block sizes between 1 and 1432 bytes, instead of 512 only.

It ought to be 1476: Ethernet MTU = 1500, minus a minimum of 20 bytes
for an IPv4 header and 4 for a TFTP header.

Some bootloaders including Syslinux sandbag this value to avoid creating
fragmented packets in case there is a VPN or something similar in the
network path.

-hpa





Re: [Qemu-devel] RFC: virtio-rng and /dev/urandom

2016-04-18 Thread H. Peter Anvin
On April 18, 2016 4:26:24 AM PDT, "Daniel P. Berrange" <berra...@redhat.com> 
wrote:
>On Mon, Apr 18, 2016 at 01:07:40PM +0200, Hubert Kario wrote:
>> On Monday 18 April 2016 02:46:19 H. Peter Anvin wrote:
>> > Another thing that really needs to be addressed, but is a separate
>> > issue: invalidating and reseeding the entropy pool after a snapshot
>> > event.
>> 
>> definitely agreed
>> 
>> though just reseeding would be sufficient - the goal is to make the 
>> output unpredictable and unique between multiple machines starting
>from 
>> the same snapshot, feeding enough random data to make the entropy
>pool 
>> unique again is sufficient to achieve that
>
>If you're spawning multiple machines from the same base snapshot,
>the seeding of RNG is just one of many many things that need
>dealing with. eg new /etc/machine-id, new ssh host keys, changing
>MAC address of NICs with corresponding guest config file changes,
>many other application specific identifiers / keys intended to
>be unique per machine.  As such, libvirt explicitly tries to
>prevent you spawning multiple machines from the same snapshot.
>
>That all said, Microsoft HyperV has defined a concept of a
>"Virtual Machine Generation ID" and specified various hypervisor
>operations which should result in this value changing[1]. For example
>restoring from a snapshot should always change the genid, as would
>restoring from backup, or cloned from another image, or failed over
>during disaster recovery.
>
>This vm genid is exposed to the guest via ACPI and there's an
>notification whenever it changes.
>
>There are patches for QEMU[2] to support this feature in a manner that
>is compatible with the hyperv spec, but they are sadly still not
>merged :-(
>
>So it would be possible for the Linux kernel to re-initialize its
>RNG after snapshot by hooking into the vm-genid ACPI notification.
>
>
>Regards,
>Daniel
>
>[1]
>https://lists.nongnu.org/archive/html/qemu-devel/2014-10/msg00489.html
>[2] https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg05599.html

There are multiple machines, and there are snapshots restored.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.



Re: [Qemu-devel] RFC: virtio-rng and /dev/urandom

2016-04-18 Thread H. Peter Anvin
On April 18, 2016 2:28:42 AM PDT, "Daniel P. Berrange" <berra...@redhat.com> 
wrote:
>On Fri, Apr 15, 2016 at 08:56:59AM -0700, H. Peter Anvin wrote:
>> On April 15, 2016 3:41:34 AM PDT, Cole Robinson <crobi...@redhat.com>
>wrote:
>> >Libvirt currently rejects using host /dev/urandom as an input source
>> >for a
>> >virtio-rng device. The only accepted sources are /dev/random and
>> >/dev/hwrng.
>> >This is the result of discussions on qemu-devel around when the
>feature
>> >was
>> >first added (2013). Examples:
>> >
>> >http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02387.html
>>
>>https://lists.gnu.org/archive/html/qemu-devel/2013-03/threads.html#00023
>> >
>> >libvirt's rejection of /dev/urandom has generated some complaints
>from
>> >users:
>> >
>> >https://bugzilla.redhat.com/show_bug.cgi?id=1074464
>> >* cited: http://www.2uo.de/myths-about-urandom/
>> >http://www.redhat.com/archives/libvir-list/2016-March/msg01062.html
>> >http://www.redhat.com/archives/libvir-list/2016-April/msg00186.html
>> >
>> >I think it's worth having another discussion about this, at least
>with
>> >a
>> >recent argument in one place so we can put it to bed. I'm CCing a
>bunch
>> >of
>> >people. I think the questions are:
>> >
>> >1) is the original recommendation to never use
>virtio-rng+/dev/urandom
>> >correct?
>> >
>> >2) regardless of #1, should we continue to reject that config in
>> >libvirt?
>> >
>> >Thanks,
>> >Cole
>> 
>> Using /dev/urandom for virtio-rng, *except* perhaps for a small seed,
>> it a complete waste of cycles.  There is absolutely no reason to have
>> one prng feed another.
>
>Regardless of the performance aspect, the key question we need the
>answer to is whether it *cryptographically safe* to use /dev/urandom
>on the host to feed virtio-rng. The original discussion said it was
>/unsafe/ to use /dev/urandom, hence why we do not allow it.  If the
>only downside is wasted performance, then it is reasonable to allow
>the user to use /dev/urandom if they so wish.
>
>Regards,
>Daniel

Perhaps.  What we do know is that it at least used to be a fairly common 
misconfiguration; up there with people who would feed /dev/urandom to rngd!

Probably there ought to be a backend which knows to use urandom for a seed and 
then fall back to random, rather than simply relying on a file name.

Another thing that really needs to be addressed, but is a separate issue: 
invalidating and reseeding the entropy pool after a snapshot event.

-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.



Re: [Qemu-devel] RFC: virtio-rng and /dev/urandom

2016-04-17 Thread H. Peter Anvin
On 04/16/16 01:31, Paolo Bonzini wrote:
> 
> Right, but there's always the point about people that use heterogeneous
> hosts and cannot pass rdrand/rdseed to the guest.  For these, we should
> add a QEMU driver that uses rdrand/rdseed, and thus decouples virtio-rng
> from the host /dev/* completely.
> 
> From the libvirt POV there are various possibilities:
> 
> - Libvirt can have a libvirt.conf parameter that says "ignore whatever is
> specified in the guest XML if rdrand/rdseed is available, and instead use
> rdrand/rdseed".
> 
> - Libvirt can allow specifying rdrand/rdseed _and_ an additional backend,
> like this:
> 
> 
> /dev/random
> 
> and fallback to the second if rdrand/rdseed are not available.
> 

The other thing, and this is one area where there is some legitimacy to
the /dev/urandom argument: on a fresh boot, it would be highly desirable
to get a seed value from virtio-rng even if that is "entropyless".  The
backwards-compatible way would be to provide, say, 64 bytes of
/dev/urandom before switching to /dev/random, but it might be desirable
to give the guest OS some way to cause that to reset, explicitly
requesting a new seed after an in-VM guest reboot, kexec et al.

This also ties into the proposed MSR to support kASLR in the guest in
the absence of rdrand/rdseed.  Using virtio in that phase of bootup is
generally not feasible.

-hpa







Re: [Qemu-devel] RFC: virtio-rng and /dev/urandom

2016-04-17 Thread H. Peter Anvin
On 04/16/16 01:31, Paolo Bonzini wrote:
> 
> Right, but there's always the point about people that use heterogeneous
> hosts and cannot pass rdrand/rdseed to the guest.  For these, we should
> add a QEMU driver that uses rdrand/rdseed, and thus decouples virtio-rng
> from the host /dev/* completely.
> 

You should be able to crib the code from rngd (rng-tools) pretty much
verbatim.

-hpa





Re: [Qemu-devel] RFC: virtio-rng and /dev/urandom

2016-04-15 Thread H. Peter Anvin
On April 15, 2016 9:10:44 AM PDT, Hubert Kario  wrote:
>On Friday 15 April 2016 09:47:51 Eric Blake wrote:
>> On 04/15/2016 04:41 AM, Cole Robinson wrote:
>> > Libvirt currently rejects using host /dev/urandom as an input
>source
>> > for a virtio-rng device. The only accepted sources are /dev/random
>> > and /dev/hwrng. This is the result of discussions on qemu-devel
>> > around when the feature was first added (2013). Examples:
>> > 
>> > http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02387.html
>> >
>https://lists.gnu.org/archive/html/qemu-devel/2013-03/threads.html#0
>> > 0023
>> > 
>> > libvirt's rejection of /dev/urandom has generated some complaints
>> > from users:
>> > 
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1074464
>> > * cited: http://www.2uo.de/myths-about-urandom/
>> > http://www.redhat.com/archives/libvir-list/2016-March/msg01062.html
>> > http://www.redhat.com/archives/libvir-list/2016-April/msg00186.html
>> > 
>> > I think it's worth having another discussion about this, at least
>> > with a recent argument in one place so we can put it to bed. I'm
>> > CCing a bunch of people. I think the questions are:
>> > 
>> > 1) is the original recommendation to never use
>> > virtio-rng+/dev/urandom correct?
>> That I'm not sure about - and the answer may be context-dependent
>(for
>> example a FIPS user may care more than an ordinary user)
>
>/dev/urandom use is FIPS compliant, no FIPS-validated protocol or 
>cryptographic primitive requires the "fresh" entropy provided by 
>/dev/random. All primitives are designed to work with weaker entropy 
>guarantees than what /dev/urandom provides.

So: using urandom for a seed makes sense, but "unplugging the drain" is a huge 
waste of resources and provides absolutely zero value.

Also, I do not believe /dev/urandom is FIPS compliant.  Finally, the refill 
policy is different, so it is not really true the algorithm is the same.

All in all, other than a seed value it really doesn't make any sense.  Of 
course, none of this matters on newer Intel hardware ;)

-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.



Re: [Qemu-devel] RFC: virtio-rng and /dev/urandom

2016-04-15 Thread H. Peter Anvin
On April 15, 2016 9:10:44 AM PDT, Hubert Kario  wrote:
>On Friday 15 April 2016 09:47:51 Eric Blake wrote:
>> On 04/15/2016 04:41 AM, Cole Robinson wrote:
>> > Libvirt currently rejects using host /dev/urandom as an input
>source
>> > for a virtio-rng device. The only accepted sources are /dev/random
>> > and /dev/hwrng. This is the result of discussions on qemu-devel
>> > around when the feature was first added (2013). Examples:
>> > 
>> > http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02387.html
>> >
>https://lists.gnu.org/archive/html/qemu-devel/2013-03/threads.html#0
>> > 0023
>> > 
>> > libvirt's rejection of /dev/urandom has generated some complaints
>> > from users:
>> > 
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1074464
>> > * cited: http://www.2uo.de/myths-about-urandom/
>> > http://www.redhat.com/archives/libvir-list/2016-March/msg01062.html
>> > http://www.redhat.com/archives/libvir-list/2016-April/msg00186.html
>> > 
>> > I think it's worth having another discussion about this, at least
>> > with a recent argument in one place so we can put it to bed. I'm
>> > CCing a bunch of people. I think the questions are:
>> > 
>> > 1) is the original recommendation to never use
>> > virtio-rng+/dev/urandom correct?
>> That I'm not sure about - and the answer may be context-dependent
>(for
>> example a FIPS user may care more than an ordinary user)
>
>/dev/urandom use is FIPS compliant, no FIPS-validated protocol or 
>cryptographic primitive requires the "fresh" entropy provided by 
>/dev/random. All primitives are designed to work with weaker entropy 
>guarantees than what /dev/urandom provides.

That's not the point.  The point is that it is a complete waste of resources 
when the PRNG can simply be run in the guest
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.



Re: [Qemu-devel] RFC: virtio-rng and /dev/urandom

2016-04-15 Thread H. Peter Anvin
On April 15, 2016 3:41:34 AM PDT, Cole Robinson  wrote:
>Libvirt currently rejects using host /dev/urandom as an input source
>for a
>virtio-rng device. The only accepted sources are /dev/random and
>/dev/hwrng.
>This is the result of discussions on qemu-devel around when the feature
>was
>first added (2013). Examples:
>
>http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02387.html
>https://lists.gnu.org/archive/html/qemu-devel/2013-03/threads.html#00023
>
>libvirt's rejection of /dev/urandom has generated some complaints from
>users:
>
>https://bugzilla.redhat.com/show_bug.cgi?id=1074464
>* cited: http://www.2uo.de/myths-about-urandom/
>http://www.redhat.com/archives/libvir-list/2016-March/msg01062.html
>http://www.redhat.com/archives/libvir-list/2016-April/msg00186.html
>
>I think it's worth having another discussion about this, at least with
>a
>recent argument in one place so we can put it to bed. I'm CCing a bunch
>of
>people. I think the questions are:
>
>1) is the original recommendation to never use virtio-rng+/dev/urandom
>correct?
>
>2) regardless of #1, should we continue to reject that config in
>libvirt?
>
>Thanks,
>Cole

Using /dev/urandom for virtio-rng, *except* perhaps for a small seed, it a 
complete waste of cycles.  There is absolutely no reason to have one prng feed 
another.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.



Re: [Qemu-devel] [PATCH v3 1/4] KVM/X86: Fix xsave cpuid exposing bug

2014-01-22 Thread H. Peter Anvin
On 01/22/2014 02:21 AM, Paolo Bonzini wrote:
 Il 21/01/2014 19:59, Liu, Jinsong ha scritto:
 From 3155a190ce6ebb213e6c724240f4e6620ba67a9d Mon Sep 17 00:00:00 2001
 From: Liu Jinsong jinsong@intel.com
 Date: Fri, 13 Dec 2013 02:32:03 +0800
 Subject: [PATCH v3 1/4] KVM/X86: Fix xsave cpuid exposing bug

 EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable.
 Bit 63 of XCR0 is reserved for future expansion.

 Signed-off-by: Liu Jinsong jinsong@intel.com
 
 Peter, can I have your acked-by on this?
 

Yes.

Acked-by: H. Peter Anvin h...@linux.intel.com




Re: [Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2014-01-06 Thread H. Peter Anvin
On 03/25/2013 01:56 PM, Eduardo Habkost wrote:
 

 It needs to be possible to fix bugs
 
 It is possible to fix them today: just write a compat function or add a
 global variable that is handled by cpu_x86_init(), and call it from the
 pc-1.3 machine-type init function. See disable_kvm_pv_eoi() and
 enable_compat_apic_id_mode(), for example.
 
 The difference is that this will be much easier once we introduce the
 static properties: then you just need to add the compat property values
 to the compat_props field on the machine-type struct.
 

Hi guys,

Any movement on this in the past year?

-hpa




Re: [Qemu-devel] [PATCH 2/3] X86, mpx: Intel MPX definition

2013-12-06 Thread H. Peter Anvin
No... we always ask for cpufeature.h patches separately because they sometimes 
cause conflicts between branches.

Borislav Petkov b...@alien8.de wrote:
On Sat, Dec 07, 2013 at 02:52:55AM +0800, Qiaowei Ren wrote:
 
 Signed-off-by: Qiaowei Ren qiaowei@intel.com
 Signed-off-by: Xudong Hao xudong@intel.com
 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  arch/x86/include/asm/cpufeature.h |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

This patch should probably be merged with the next one...

 
 diff --git a/arch/x86/include/asm/cpufeature.h
b/arch/x86/include/asm/cpufeature.h
 index d3f5c63..6c2738d 100644
 --- a/arch/x86/include/asm/cpufeature.h
 +++ b/arch/x86/include/asm/cpufeature.h
 @@ -216,6 +216,7 @@
  #define X86_FEATURE_ERMS(9*32+ 9) /* Enhanced REP MOVSB/STOSB */
  #define X86_FEATURE_INVPCID (9*32+10) /* Invalidate Processor
Context ID */
  #define X86_FEATURE_RTM (9*32+11) /* Restricted Transactional
Memory */
 +#define X86_FEATURE_MPX (9*32+14) /* Memory Protection 
 Extension */
  #define X86_FEATURE_RDSEED  (9*32+18) /* The RDSEED instruction */
  #define X86_FEATURE_ADX (9*32+19) /* The ADCX and ADOX 
 instructions
*/
  #define X86_FEATURE_SMAP(9*32+20) /* Supervisor Mode Access
Prevention */
 @@ -330,6 +331,7 @@ extern const char * const x86_power_flags[32];
  #define cpu_has_perfctr_l2  boot_cpu_has(X86_FEATURE_PERFCTR_L2)
  #define cpu_has_cx8 boot_cpu_has(X86_FEATURE_CX8)
  #define cpu_has_cx16boot_cpu_has(X86_FEATURE_CX16)
 +#define cpu_has_mpx boot_cpu_has(X86_FEATURE_MPX)

... and we're trying to not have more of those macros so people should
be simply
using boot_cpu_has(X86_FEATURE_YYY).

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.



Re: [Qemu-devel] [PATCH v3 0/2] Intel MPX feature support at Qemu

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 08:27 AM, Liu, Jinsong wrote:
 Eric Blake wrote:
 On 12/06/2013 07:06 AM, Liu, Jinsong wrote:
 Intel has released Memory Protection Extensions (MPX) recently.
 Please refer to
 http://download-software.intel.com/sites/default/files/319433-015.pdf

 These 2 patches are version2 to support Intel MPX at qemu side.

 You still aren't threading correctly, which makes it hard to track
 your series.  Please review
 http://wiki.qemu.org/Contribute/SubmitAPatch and make sure your 'git
 send-email' settings allow for proper threading; a good way to test
 that is to first send the patch series to yourself to ensure your
 environment is set up correctly. 
 
 Thanks Blake! will take care and learn using git send-email when I send 
 patches later (i.e. kvm mpx patches).
 

Not to mention that Linux kernel patches should be Cc:'d to
linux-ker...@vger.kernel.org.

-hpa





Re: [Qemu-devel] [PATCH 3/3] X86, mpx: Intel MPX xstate feature definition

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 05:46 AM, Borislav Petkov wrote:
 
 I'm guessing this and the struct lwp_struct above is being added so that
 you can have the LWP XSAVE area size? If so, you don't need it: LWP
 XSAVE area is 128 bytes at offset 832 according to my manuals so I'd
 guess having a u8 lwp_area[128] should be fine.
 

Sure, but any reason to *not* document the internal structure?

 
 +struct bndregs_struct bndregs;
 +struct bndcsr_struct bndcsr;
  /* new processor state extensions will go here */
  } __attribute__ ((packed, aligned (64)));
  
 diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
 index 0415cda..5cd9de3 100644
 --- a/arch/x86/include/asm/xsave.h
 +++ b/arch/x86/include/asm/xsave.h
 @@ -9,6 +9,8 @@
  #define XSTATE_FP   0x1
  #define XSTATE_SSE  0x2
  #define XSTATE_YMM  0x4
 +#define XSTATE_BNDREGS  0x8
 +#define XSTATE_BNDCSR   0x10
  
  #define XSTATE_FPSSE(XSTATE_FP | XSTATE_SSE)
  
 @@ -20,10 +22,12 @@
  #define XSAVE_YMM_SIZE  256
  #define XSAVE_YMM_OFFSET(XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET)
  
 +#define XSTATE_FLEXIBLE (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
 
 What's the use of that macro if it is used only once?

Documentation seems good enough.  Explicitly separating out the features
which MUST be eagerly saved seems like a good thing.

 +#define XSTATE_EAGER(XSTATE_BNDREGS | XSTATE_BNDCSR)
  /*
   * These are the features that the OS can handle currently.
   */
 -#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
 +#define XCNTXT_MASK (XSTATE_FLEXIBLE | XSTATE_EAGER)
  

-hpa





Re: [Qemu-devel] [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 09:35 AM, Paolo Bonzini wrote:
 
 Sorry for the back-and-forth, but I think this and the removal of
 XSTATE_FLEXIBLE (perhaps XSTATE_LAZY?) makes your v2 worse than v1.
 
 Since Peter already said the same, please undo these changes.
 
 Also, how is XSTATE_EAGER used?  Should MPX be disabled when xsaveopt is
 disabled on the kernel command line?  (Liu, how would this affect the
 KVM patches, too?)
 

There are two options: we could disable MPX etc. or we could force eager
saving (using xsave) even if xsaveopt is disabled.  It is a hard call to
make, but I guess I'm leaning towards the latter; we could add an
lazyxsave option to explicitly disable all eager features if there is
use for that.

-hpa





Re: [Qemu-devel] [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 12:05 PM, Liu, Jinsong wrote:

 Since Peter already said the same, please undo these changes.

 Also, how is XSTATE_EAGER used?  Should MPX be disabled when xsaveopt
 is disabled on the kernel command line?  (Liu, how would this affect
 the KVM patches, too?)

 Paolo
 
 Currently seems no, and if needed we can add a new patch at kvm side 
 accordingly when native mpx patches checked in.
 

We need to either disable these features in lazy mode, or we need to
force eager mode if these features are to be supported.  The problem
with the latter is that it means forcing eager mode regardless of if
anything actually *uses* these features.

A third option would be to require applications to use a prctl() or
similar to enable eager-save features.

Thoughts?

-hpa




Re: [Qemu-devel] [PATCH v2 3/3] X86, mpx: Intel MPX xstate feature definition

2013-12-06 Thread H. Peter Anvin
On 12/06/2013 05:16 PM, Ren, Qiaowei wrote:
 Jinsong think that both kvm and host depend on these feature definition 
 header file, so we firstly submit these files depended on.

Yes, but we can't turn on the feature without proper protection.  Either
way, they are now in tip:x86/cpufeature.

-hpa





Re: [Qemu-devel] [PATCH 1/4] X86: Intel MPX definiation

2013-12-05 Thread H. Peter Anvin
On 12/05/2013 08:08 AM, Paolo Bonzini wrote:
 Il 02/12/2013 17:43, Liu, Jinsong ha scritto:
 From fbfa537f690eca139a96c6b2636ab5130bf57716 Mon Sep 17 00:00:00 2001
 From: Liu Jinsong jinsong@intel.com
 Date: Fri, 29 Nov 2013 01:27:00 +0800
 Subject: [PATCH 1/4] X86: Intel MPX definiation

 Signed-off-by: Xudong Hao xudong@intel.com
 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  arch/x86/include/asm/cpufeature.h |2 ++
  arch/x86/include/asm/xsave.h  |5 -
  2 files changed, 6 insertions(+), 1 deletions(-)

 
 hpa/Ingo/Thomas, can you give your Acked-by for this patch?
 
 I'm not sure of the consequences of changing XCNTXT_MASK.  This series
 (which was submitted with the wrong threading) wants it so that KVM can
 use fpu_save_init and fpu_restore_checking to save and restore the MPX
 state of the guest.
 

Hi, I'm currently reviewing internally another set of patches for MPX
support which would at least in part conflict with these.  I don't see
the rest of the series -- where was it posted?

Either way:

1. asm/cpufeatures.h patches should always be separate, as we put those
into a special branch into the -tip tree since they touch so many other
things.

2. Enabling MPX is only safe with XSTATE_EAGER, which Qiaowei's patchset
has done correctly.

-hpa




Re: [Qemu-devel] Would virtio support 64 bit address for vring virtqueue?

2013-08-28 Thread H. Peter Anvin
On 08/28/2013 10:35 AM, Anthony Liguori wrote:
 Yes, it was originally designed with the 16TB limit in mind.
 
 PCI doesn't support 64-bit PIO operations so it would have required a
 high/low register and additional magic.
 

s/PCI/x86/

Additional magic is needed only if atomic changes are necessary, but
in this case this is stuff that is set up during early configuration and
so isn't an issue.  The additional magic needed in that case is latch
logic - the write to the low part doesn't actually take effect until the
high part is written.

-hpa




Re: [Qemu-devel] [RFC] sanitize memory on system reset

2013-06-14 Thread H. Peter Anvin
Only on a real 286.  At least since 486 the legacy switch has been INIT, not 
RESET.

Alexander Graf ag...@suse.de wrote:



Am 14.06.2013 um 08:56 schrieb Christian Borntraeger
borntrae...@de.ibm.com:

 On 13/06/13 13:56, Anthony Liguori wrote:
 Markus Armbruster arm...@redhat.com writes:
 
 Peter Lieven p...@kamp.de writes:
 
 On 13.06.2013 10:40, Stefan Hajnoczi wrote:
 On Thu, Jun 13, 2013 at 08:09:09AM +0200, Peter Lieven wrote:
 I was thinking if it would be a good idea to zeroize all memory
 resources on system reset and
 madvise dontneed them afterwards. This would avoid system reset
 attacks in case the attacker
 has only access to the console of a vServer but not on the
physical
 host and it would shrink
 RSS size of the vServer siginificantly.
 I wonder if you'll hit weird OS installers or PXE clients that
rely on
 stashing stuff in memory across reset.
 One point:
 Wouldn't a memory test which some systems do at startup break
these as well?
 
 Systems that distinguish between warm and cold boot (such as PCs)
 generally run POST only on cold boot.
 
 I'm not saying triggering warm reboot and expecting memory contents
to
 survive is a good idea, but it has been done.
 
 Doesn't kexec do a warm reboot stashing the new kernel somewhere in
 memory?
 
 It does something like that on s390.
 There is a diagnose instruction to the machine, that resets all
 subsystems and cpus in a defined state, but lets the memory
untouched.
 So we want to be able to define the components of the system which we
are 
 going to reset and have two cases:
 1. reset everything and clear the memory
 2. just reset the cpus and devices, but leave the memory untouched
 
 For case 2 we basically want to avoid memory clearing AND bios
reloading

Legacy 286 protected mode to real mode switching also happens through
the CPU reset PIN, so there certainly is a need to distinguish.

Alex

 
 
 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.



Re: [Qemu-devel] [Patch] fix /proc/self/maps output

2013-03-29 Thread H. Peter Anvin
On 03/29/2013 09:01 AM, Christophe Lyon wrote:
 Add a space at end of line when there is no filename to print, to
 conform to linux kernel format.
 
 Signed-off-by: Christophe Lyon christophe.l...@linaro.org
 ---
  linux-user/syscall.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/linux-user/syscall.c b/linux-user/syscall.c
 index a148d9f..3b0ca86 100644
 --- a/linux-user/syscall.c
 +++ b/linux-user/syscall.c
 @@ -5016,7 +5016,7 @@ static int open_self_maps(void *cpu_env, int fd)
   %c%c%c%c %08 PRIx64  %02x:%02x %d%s%s\n,
  h2g(min), h2g(max), flag_r, flag_w,
  flag_x, flag_p, offset, dev_maj, dev_min, inode,
 -path[0] ?: , path);
 +path[0] ?:  , path);
  }
  }
 

Please move the fixed space into the format.  Perhaps even use a %-123s
type format even...

-hpa





Re: [Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2013-03-28 Thread H. Peter Anvin
On 03/28/2013 12:15 PM, Aurelien Jarno wrote:
 
 This really looks like Linux kernel specific. I haven't been able to
 test on a real machine, but the documentation I have found suggest that
 without and x87 FPU, the FPU instructions are simply ignored. The common
 way to detect an FPU is therefore to initialize registers to a given
 value, run fnstsw and fnstcw instructions with the register in arguments
 and see if they have been modified.
 
 The Linux kernel indeed set the initial value of these registers to
 0x, but I am not sure all codes are doing the same.
 
 For me it looks like better to skip such instructions directly in
 translate.c. As a bonus it seems easy to do that for all FPU
 instructions.
 

At least *some* real-life CPUs returned 0x, at the very least the
machines with external buses did (e.g. 386 without 387).  I don't have
access to a live 486SX so I can test if 486SX behaved differenly.

-hpa





Re: [Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2013-03-28 Thread H. Peter Anvin
On 03/28/2013 12:15 PM, Aurelien Jarno wrote:
 
 This really looks like Linux kernel specific. I haven't been able to
 test on a real machine, but the documentation I have found suggest that
 without and x87 FPU, the FPU instructions are simply ignored. The common
 way to detect an FPU is therefore to initialize registers to a given
 value, run fnstsw and fnstcw instructions with the register in arguments
 and see if they have been modified.
 
 The Linux kernel indeed set the initial value of these registers to
 0x, but I am not sure all codes are doing the same.
 
 For me it looks like better to skip such instructions directly in
 translate.c. As a bonus it seems easy to do that for all FPU
 instructions.
 

It might have been (and this is from memory, so don't take it for
anything) that the register form receives 0x, but the memory form is
ignored.

-hpa





Re: [Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2013-03-28 Thread H. Peter Anvin
Qemu is absolutely horrid at modeling corner cases.

Rob Landley r...@landley.net wrote:

On 03/28/2013 03:12:11 PM, H. Peter Anvin wrote:
 On 03/28/2013 12:15 PM, Aurelien Jarno wrote:
 
  This really looks like Linux kernel specific. I haven't been able
to
  test on a real machine, but the documentation I have found suggest 

 that
  without and x87 FPU, the FPU instructions are simply ignored. The  
 common
  way to detect an FPU is therefore to initialize registers to a
given
  value, run fnstsw and fnstcw instructions with the register in  
 arguments
  and see if they have been modified.
 
  The Linux kernel indeed set the initial value of these registers to
  0x, but I am not sure all codes are doing the same.
 
  For me it looks like better to skip such instructions directly in
  translate.c. As a bonus it seems easy to do that for all FPU
  instructions.
 
 
 It might have been (and this is from memory, so don't take it for
 anything) that the register form receives 0x, but the memory form
 
 is
 ignored.

Speaking of which, Solar Designer recently found a bug where pentium 3 

silently ignores the 66 prefix that later became SSE2, and thus the  
code ran but produced the wrong result:

https://twitter.com/solardiz/status/316204216962142209
https://twitter.com/solardiz/status/316207184134410240

But this isn't what QEMU does:

https://twitter.com/solardiz/status/316944417871245313

Rob

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.



Re: [Qemu-devel] [RFC PATCH 3/3] mc146818rtc: export the timezone information

2013-03-25 Thread H. Peter Anvin
On 03/25/2013 02:05 AM, Paolo Bonzini wrote:
 
 Interesting, do you have SeaBIOS and/or OVMF patches for this?
 

Not at this point.

-hpa





Re: [Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2013-03-25 Thread H. Peter Anvin
On 03/25/2013 08:15 AM, Igor Mammedov wrote:

 Such changes have been rejected in the past (e.g., n270 Atom).
 I personally wouldn't object to 486 changes, but I guess it should
 rather be handled via Igor's CPU static properties that I have in my
 review queue: The .model value would be set to 8 but the PC machine
 would be changed alongside to set model = 0 for pc-1.4 and earlier.
 It doesn't relates to property refactoring nor to slim CPU sub-classes
 conversion either. So it could go in independently.
 
 But is this change safe from migration POV?
 

Well, given that the CPU model presented is actually closer to a model 8
than a model 0 it probably is... but the real question is what would
cause someone to do migration of a 486 CPU model.

The n270 issue is problematic, because right now n270 can't actually
run software compiled for N270...

-hpa





Re: [Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2013-03-25 Thread H. Peter Anvin
On 03/25/2013 12:05 PM, Eduardo Habkost wrote:
 On Mon, Mar 25, 2013 at 11:44:30AM -0700, H. Peter Anvin wrote:
 On 03/25/2013 08:15 AM, Igor Mammedov wrote:

 Such changes have been rejected in the past (e.g., n270 Atom).
 I personally wouldn't object to 486 changes, but I guess it should
 rather be handled via Igor's CPU static properties that I have in my
 review queue: The .model value would be set to 8 but the PC machine
 would be changed alongside to set model = 0 for pc-1.4 and earlier.
 It doesn't relates to property refactoring nor to slim CPU sub-classes
 conversion either. So it could go in independently.

 But is this change safe from migration POV?


 Well, given that the CPU model presented is actually closer to a model 8
 than a model 0 it probably is... but the real question is what would
 cause someone to do migration of a 486 CPU model.

 The n270 issue is problematic, because right now n270 can't actually
 run software compiled for N270...
 
 FWIW, I wouldn't mind too much if the maintainers decide to document 486
 and n270 as migration-unsafe and then knowingly break live-migration
 of those CPU models between qemu = 1.3 and qemu = 1.4. It's up to the
 maintainers to choose which way to go.
 

The right thing, of course (and I believe that's where things are going)
is to unwind these descriptions at the time the VM is created; the
migration should implement the machine as it was launched.

If that isn't practical, then the right thing to do is probably to have
some kind of machine description conversion (so, say, 486 can be
converted to 486-1.3 containing the legacy description), but telling
people that -cpu n270 is something other than a real N270 that can't run
N270 software is user-hostile in the extreme.

It needs to be possible to fix bugs

-hpa




Re: [Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2013-03-25 Thread H. Peter Anvin
Now, all this said... if the only objection is the change of model
number for 486 then I suggest just dropping that.

-hpa




Re: [Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2013-03-25 Thread H. Peter Anvin
On 03/25/2013 01:56 PM, Eduardo Habkost wrote:
 

 It needs to be possible to fix bugs
 
 It is possible to fix them today: just write a compat function or add a
 global variable that is handled by cpu_x86_init(), and call it from the
 pc-1.3 machine-type init function. See disable_kvm_pv_eoi() and
 enable_compat_apic_id_mode(), for example.
 
 The difference is that this will be much easier once we introduce the
 static properties: then you just need to add the compat property values
 to the compat_props field on the machine-type struct.
 

It sounds like this is underway, and since the priority for the 486 bit
is very low it is better for that bit to just wait.

-hpa




Re: [Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2013-03-23 Thread H. Peter Anvin
Low priority ping on this patchset...?

-hpa




Re: [Qemu-devel] virtio-rng and fd passing

2013-03-04 Thread H. Peter Anvin
On 03/02/2013 04:23 AM, Paolo Bonzini wrote:
 Il 02/03/2013 04:13, Anthony Liguori ha scritto:
 There is no valid use-case of rng-random other than using /dev/random.
 In fact, it was probably a mistake to even allow a filename to be
 specified because it lets people do silly things (like /dev/urandom).

 If you want anything other than /dev/random, you should use rng-egd.
 
 /dev/hwrng makes sense too.

Only if the host isn't using it, which it almost certainly should if
there is something there.  On the other hand, yes, it is
cryptographically sound (since it presents itself as /dev/hwrng in the
guest!) and it does make sense for a very thin host.

-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] virtio-rng and fd passing

2013-03-01 Thread H. Peter Anvin
On 02/28/2013 04:36 PM, Eric Blake wrote:
 Stefan Berger and I discovered on IRC that virtio-rng is unable to 
 support fd passing.  We attempted:
 
 qemu-system-x86_64 ... -add-fd
 set=4,fd=34,opaque=RDONLY:/dev/urandom

 -object rng-random,id=rng0,filename=/dev/fdset/4 -device 
 virtio-rng-pci,rng=rng0,bus=pci.0,addr=0x6
 

Unrelated, but you really, really, really don't want to pass
/dev/urandom there, use /dev/random.

-hpa





Re: [Qemu-devel] 9pfs segfaults on chmod(special)

2013-03-01 Thread H. Peter Anvin
On 02/28/2013 04:24 AM, M. Mohan Kumar wrote:
 
 By default 9p.u is used, you can override by that
  mount -t 9p -otrans=virtio,version=9p2000.L tag /mnt
 

Shouldn't we change that default?

-hpa





Re: [Qemu-devel] virtio-rng and fd passing

2013-03-01 Thread H. Peter Anvin
The guest kernel already provides the PRNG itself.  We have been over this...

Stefan Berger stef...@linux.vnet.ibm.com wrote:

On 03/01/2013 02:37 PM, H. Peter Anvin wrote:
 On 02/28/2013 04:36 PM, Eric Blake wrote:
 Stefan Berger and I discovered on IRC that virtio-rng is unable to
 support fd passing.  We attempted:

 qemu-system-x86_64 ... -add-fd
 set=4,fd=34,opaque=RDONLY:/dev/urandom
  
 -object rng-random,id=rng0,filename=/dev/fdset/4 -device
 virtio-rng-pci,rng=rng0,bus=pci.0,addr=0x6

 Unrelated, but you really, really, really don't want to pass
 /dev/urandom there, use /dev/random.

From what I am reading about /dev/random is that it will start blocking

once not enough entropy is available anymore. Sounds like this could be

abused if multiple VMs were using this device and one drains the 
entropy.. An alternative may be to pick go through a crypto library
that 
seeds itself with entropy and implements random number generators 
following NIST 800-90 for example. Freebl would offer at least one such

implementation:

http://dxr.mozilla.org/mozilla-central/security/nss/lib/freebl/drbg.c.html

-   search for 'NIST' there

 Stefan

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.



Re: [Qemu-devel] [SeaBIOS] [RFC PATCH] Distinguish between reset types

2013-02-28 Thread H. Peter Anvin
On 02/20/2013 07:37 AM, David Woodhouse wrote:
 On Wed, 2013-02-20 at 16:34 +0100, Paolo Bonzini wrote:
 Il 20/02/2013 16:18, Laszlo Ersek ha scritto:
 I'm beginning to wish I'd just ignored the fact that
 we need a properly working soft reset to get back from 286 protected
 mode to real mode, and wired up the damn PAM reset unconditionally. I'm
 not convinced that the protected-real mode transition will work for
 anyone anyway.
 I believe currently we must be somewhere between soft reset  hard
 reset. I estimate getting closer to a well-emulated hard reset is more
 important than getting closer to a soft one. If you were to extend the
 i440FX reset handler so that it unconditionally resets the PAMs, I'd
 give my Rb. (Take it for what it's worth :))

 It would actually make sense.  The right way to do soft reset has
 nothing to do with qemu_system_reset_request().
 
 I've posted that version of the patch, with a suitable comment.
 

Right... the soft reset described above is really INIT, which isn't
even a reset in modern CPUs (it couldn't be, it has to preserve caches)
but more of a special interrupt.  It is also used during multiprocessor
init.

-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




[Qemu-devel] [RFC PATCH 2/3] target-i386: Raise #UD on accessing non-existent control registers

2013-02-27 Thread H. Peter Anvin
From: H. Peter Anvin h...@zytor.com

If we touch control registers that don't exist, either read or write,
raise the #UD exception (undefined opcode).

This is useful for testing booting on old CPUs.

CR4 is assumed to exist if and only if there are CPU features other
than the FPU defined (typically at least VME).

Signed-off-by: H. Peter Anvin h...@zytor.com
---
 target-i386/misc_helper.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
index 1ff25d1..6da3f32 100644
--- a/target-i386/misc_helper.c
+++ b/target-i386/misc_helper.c
@@ -154,9 +154,18 @@ target_ulong helper_read_crN(CPUX86State *env, int reg)
 
 cpu_svm_check_intercept_param(env, SVM_EXIT_READ_CR0 + reg, 0);
 switch (reg) {
-default:
+case 0:
+case 2:
+case 3:
 val = env-cr[reg];
 break;
+case 4:
+if (env-cpuid_features = CPUID_FP87) {
+raise_exception_err(env, EXCP06_ILLOP, 0);
+} else {
+val = env-cr[reg];
+}
+break;
 case 8:
 if (!(env-hflags2  HF2_VINTR_MASK)) {
 val = cpu_get_apic_tpr(env-apic_state);
@@ -164,6 +173,9 @@ target_ulong helper_read_crN(CPUX86State *env, int reg)
 val = env-v_tpr;
 }
 break;
+default:
+raise_exception_err(env, EXCP06_ILLOP, 0);
+break;
 }
 return val;
 }
@@ -175,11 +187,18 @@ void helper_write_crN(CPUX86State *env, int reg, 
target_ulong t0)
 case 0:
 cpu_x86_update_cr0(env, t0);
 break;
+case 2:
+env-cr[reg] = t0;
+break;
 case 3:
 cpu_x86_update_cr3(env, t0);
 break;
 case 4:
-cpu_x86_update_cr4(env, t0);
+if (env-cpuid_features = CPUID_FP87) {
+raise_exception_err(env, EXCP06_ILLOP, 0);
+} else {
+cpu_x86_update_cr4(env, t0);
+}
 break;
 case 8:
 if (!(env-hflags2  HF2_VINTR_MASK)) {
@@ -188,7 +207,7 @@ void helper_write_crN(CPUX86State *env, int reg, 
target_ulong t0)
 env-v_tpr = t0  0x0f;
 break;
 default:
-env-cr[reg] = t0;
+raise_exception_err(env, EXCP06_ILLOP, 0);
 break;
 }
 }
-- 
1.7.11.7




[Qemu-devel] [RFC PATCH 3/3] mc146818rtc: export the timezone information

2013-02-27 Thread H. Peter Anvin
From: H. Peter Anvin h...@zytor.com

There is no standard method for storing timezone information
associated with the classic PC/AT RTC, however, there are standard
methods in ACPI (Time and Alarm Device) and EFI (GetTime/SetTime) for
getting this information.

Since these are abstract methods, it is qreally firmware-specific how
it is stored, however, since Qemu initializes the RTC in the virtual
environment that information needs to come from Qemu in the first
place.

Non-PC platforms that use the MC146181 RTC may have their own
firmware-specific methods as well.

The most logical place to stash this information is in the RTC CMOS;
not only is it logically co-located with the relevant information, but
it is also very easy to access from ACPI bytecode.  Thus, save the
timezone information in two bytes in CMOS that have no known standard
definition, but are yet within the 64 bytes that even the most basic
RTC CMOS implementations including the original MC146181 support.

Note: all timezones currently in use in the world are on 15-minutes
boundaries, which would allow this information to be stored in a
single signed byte.  However, both EFI and ACPI use a minute-granular
interface (specified as -1440 to +1440 with 2047 used to mean
unknown, this requires a minimum of 12 bits to represent); this
follows that model.

Signed-off-by: H. Peter Anvin h...@zytor.com
Cc: Kevin O'Connor ke...@koconnor.net
Cc: David Woodhouse dw...@infradead.org
---
 hw/mc146818rtc.c  | 6 ++
 hw/mc146818rtc_regs.h | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c
index 2fb11f6..72541dd 100644
--- a/hw/mc146818rtc.c
+++ b/hw/mc146818rtc.c
@@ -681,6 +681,7 @@ static void rtc_set_date_from_host(ISADevice *dev)
 {
 RTCState *s = DO_UPCAST(RTCState, dev, dev);
 struct tm tm;
+int minuteseast;
 
 qemu_get_timedate(tm, 0);
 
@@ -690,6 +691,11 @@ static void rtc_set_date_from_host(ISADevice *dev)
 
 /* set the CMOS date */
 rtc_set_cmos(s, tm);
+
+/* Set the timezone information as a signed 16-bit number of minutes */
+minuteseast = ((int64_t)s-base_rtc - (int64_t)mktime(tm)) / 60;
+s-cmos_data[RTC_TIMEZONE_L] = (uint8_t)(minuteseast);
+s-cmos_data[RTC_TIMEZONE_H] = (uint8_t)(minuteseast  8);
 }
 
 static int rtc_post_load(void *opaque, int version_id)
diff --git a/hw/mc146818rtc_regs.h b/hw/mc146818rtc_regs.h
index ccdee42..7dd5e0d 100644
--- a/hw/mc146818rtc_regs.h
+++ b/hw/mc146818rtc_regs.h
@@ -47,6 +47,8 @@
 /* PC cmos mappings */
 #define RTC_CENTURY  0x32
 #define RTC_IBM_PS2_CENTURY_BYTE 0x37
+#define RTC_TIMEZONE_L   0x3e
+#define RTC_TIMEZONE_H   0x3f
 
 #define REG_A_UIP 0x80
 
-- 
1.7.11.7




[Qemu-devel] [RFC PATCH 1/3] target-i386: Add 486sx, old486, and old486sx CPU models

2013-02-27 Thread H. Peter Anvin
From: H. Peter Anvin h...@zytor.com

Add models for 486SX, and pre-CPUID versions of the 486 (DX  SX).
Change the model number for the standard 486DX to a model which
actually had CPUID.

Note: these models are fairly vestigial, for example most of the FPU
operations still work; only F*ST[CS]W have been modified to appear as
through there is no FPU.

This also changes the classic 486 model number to 8 (DX4) which
matches the feature set presented.

Signed-off-by: H. Peter Anvin h...@zytor.com
---
 target-i386/cpu.c | 39 ++---
 target-i386/fpu_helper.c  | 12 +++--
 target-i386/misc_helper.c | 15 
 target-i386/translate.c   | 62 +++
 4 files changed, 75 insertions(+), 53 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index aab35c7..a5aad19 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -365,8 +365,11 @@ typedef struct x86_def_t {
 uint32_t cpuid_7_0_ebx_features;
 } x86_def_t;
 
-#define I486_FEATURES (CPUID_FP87 | CPUID_VME | CPUID_PSE)
-#define PENTIUM_FEATURES (I486_FEATURES | CPUID_DE | CPUID_TSC | \
+#define OLD_I486SX_FEATURES 0
+#define OLD_I486_FEATURES CPUID_FP87
+#define I486SX_FEATURES CPUID_VME /* SX2+ */
+#define I486_FEATURES (CPUID_FP87 | CPUID_VME) /* DX4 and some DX2 */
+#define PENTIUM_FEATURES (I486_FEATURES | CPUID_PSE | CPUID_DE | CPUID_TSC | \
   CPUID_MSR | CPUID_MCE | CPUID_CX8 | CPUID_MMX | CPUID_APIC)
 #define PENTIUM2_FEATURES (PENTIUM_FEATURES | CPUID_PAE | CPUID_SEP | \
   CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \
@@ -535,16 +538,46 @@ static x86_def_t builtin_x86_defs[] = {
 .model_id = Genuine Intel(R) CPU   T2600  @ 2.16GHz,
 },
 {
+.name = old486,
+.level = 0,
+.vendor = CPUID_VENDOR_INTEL,
+.family = 4,
+.model = 1,
+.stepping = 0,
+.features = OLD_I486_FEATURES,
+.xlevel = 0,
+},
+{
+.name = old486sx,
+.level = 0,
+.vendor = CPUID_VENDOR_INTEL,
+.family = 4,
+.model = 2,
+.stepping = 0,
+.features = OLD_I486SX_FEATURES,
+.xlevel = 0,
+},
+{
 .name = 486,
 .level = 1,
 .vendor = CPUID_VENDOR_INTEL,
 .family = 4,
-.model = 0,
+.model = 8,
 .stepping = 0,
 .features = I486_FEATURES,
 .xlevel = 0,
 },
 {
+.name = 486sx,
+.level = 1,
+.vendor = CPUID_VENDOR_INTEL,
+.family = 4,
+.model = 5,
+.stepping = 0,
+.features = I486SX_FEATURES,
+.xlevel = 0,
+},
+{
 .name = pentium,
 .level = 1,
 .vendor = CPUID_VENDOR_INTEL,
diff --git a/target-i386/fpu_helper.c b/target-i386/fpu_helper.c
index 44f3d27..c4c2724 100644
--- a/target-i386/fpu_helper.c
+++ b/target-i386/fpu_helper.c
@@ -530,12 +530,20 @@ void helper_fldz_FT0(CPUX86State *env)
 
 uint32_t helper_fnstsw(CPUX86State *env)
 {
-return (env-fpus  ~0x3800) | (env-fpstt  0x7)  11;
+if (!(env-cpuid_features  CPUID_FP87)) {
+return 0x;
+} else {
+return (env-fpus  ~0x3800) | (env-fpstt  0x7)  11;
+}
 }
 
 uint32_t helper_fnstcw(CPUX86State *env)
 {
-return env-fpuc;
+if (!(env-cpuid_features  CPUID_FP87)) {
+return 0x;
+} else {
+return env-fpuc;
+}
 }
 
 static void update_fp_status(CPUX86State *env)
diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
index b6d5740..1ff25d1 100644
--- a/target-i386/misc_helper.c
+++ b/target-i386/misc_helper.c
@@ -122,11 +122,16 @@ void helper_cpuid(CPUX86State *env)
 
 cpu_svm_check_intercept_param(env, SVM_EXIT_CPUID, 0);
 
-cpu_x86_cpuid(env, (uint32_t)EAX, (uint32_t)ECX, eax, ebx, ecx, edx);
-EAX = eax;
-EBX = ebx;
-ECX = ecx;
-EDX = edx;
+if (!env-cpuid_level) {
+raise_exception_err(env, EXCP06_ILLOP, 0);
+} else {
+cpu_x86_cpuid(env, (uint32_t)EAX, (uint32_t)ECX,
+  eax, ebx, ecx, edx);
+EAX = eax;
+EBX = ebx;
+ECX = ecx;
+EDX = edx;
+}
 }
 
 #if defined(CONFIG_USER_ONLY)
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 112c310..6d8abff 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -103,6 +103,8 @@ typedef struct DisasContext {
 struct TranslationBlock *tb;
 int popl_esp_hack; /* for correct popl with esp base handling */
 int rip_offset; /* only used in x86_64, but left for simplicity */
+int cpuid_family;
+int cpuid_level;
 int cpuid_features;
 int cpuid_ext_features;
 int cpuid_ext2_features;
@@ -6513,52 +6515,24 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 if (s-vm86  s-iopl != 3) {
 gen_exception(s, EXCP0D_GPF, pc_start - s-cs_base);
 } else {
+uint32_t mask

Re: [Qemu-devel] [edk2] (PAM stuff) reset doesn't work on OVMF + SeaBIOS CSM

2013-02-14 Thread H. Peter Anvin
On 02/14/2013 12:41 PM, Laszlo Ersek wrote:
 
 ). cpu_reset() [target-i386/helper.c] sets CS:IP to f000:fff0, which is
 the exact address of... reset_vector() in SeaBIOS.
 

This would be a bug, but it isn't quite true.

If you look at x86_cpu_reset() you will note that it sets the code
segment base to 0x, not 0xf as one could expect from the
above.  This is also true of a physical x86.

As such, the *real* reset vector is at 0xfff0 as opposed to the
SeaBIOS vector at 0x0 -- this is a backwards compatibility vector
which typically just issues a real reset.

Now, if Qemu doesn't handle the distinction here correctly, that is a bug.

-hpa




Re: [Qemu-devel] [SeaBIOS] [edk2] (PAM stuff) reset doesn't work on OVMF + SeaBIOS CSM

2013-02-14 Thread H. Peter Anvin

On 02/14/2013 01:27 PM, David Woodhouse wrote:


So it *is* jumping to 0xfff0 but the memory at that location isn't
what we expect? Do the PAM registers affect *that* too, or only the
region from 0xc-0xf? Surely the contents at 4GiB-δ should be
unchanged by *anything* we do with the PAM registers?

Or maybe not... after also downloading the i440fx data sheet, I'm even
more confused. There's some aliasing with... not the region at 1MiB-δ
but the region at 16MiB-δ:

(From §4.1 System Address Map):

2. High BIOS Area (FFE0_h−− _h)
   The top 2 Mbytes of the Extended Memory Region is reserved for System
   BIOS (High BIOS), extended BIOS for PCI devices, and the A20 alias of
   the system BIOS. The CPU begins execution from the High BIOS after
   reset. This region is mapped to the PCI so that the upper subset of
   this region is aliased to 16 Mbytes minus 256-Kbyte range.



That is presumably a 286 compatibility hack -- the 286 had 24 address 
lines.  I doubt anyone gives a hoot about it, and neither EDK2 nor 
SeaBIOS should care.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] [PATCH] target-i386: n270 can MOVBE

2013-02-08 Thread H. Peter Anvin
This is problematic since the n270 qemu model won't boot a real kernel compiled 
for that box.

Eduardo Habkost ehabk...@redhat.com wrote:

On Fri, Feb 08, 2013 at 10:30:02AM +0100, Borislav Petkov wrote:
 From: Borislav Petkov b...@suse.de
 
 The Atom core (cpu name n270 in QEMU speak) supports MOVBE. This is
 needed when booting 3.8 and later linux kernels built with the MATOM
 target because we require MOVBE in order to boot properly now.
 
 Cc: H. Peter Anvin h...@zytor.com
 Cc: Richard Henderson r...@twiddle.net
 Signed-off-by: Borislav Petkov b...@suse.de
 ---
 
 Right, so I was playing with booting MATOM kernels in QEMU.
 As it turns out, QEMU's n270 model doesn't advertize MOVBE
 although the real hardware supports it. Quick search pointed me to
 http://lists.nongnu.org/archive/html/qemu-devel/2013-01/msg04317.html
 which adds that support, among other things. I've merged Richard's
 patchset with qemu's current master and after applying this patch
below,
 I can report success booting an MATOM kernel with QEMU. The same
kernel
 boots on the real n270 hardware, btw.
 
  target-i386/cpu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 9f38e4435e53..83816edd8410 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -610,7 +610,8 @@ static x86_def_t builtin_x86_defs[] = {
  CPUID_ACPI | CPUID_SS | CPUID_HT | CPUID_TM | CPUID_PBE,
  /* Some CPUs got no CPUID_SEP */
  .ext_features = CPUID_EXT_SSE3 | CPUID_EXT_MONITOR |
CPUID_EXT_SSSE3 |
 -CPUID_EXT_DSCPL | CPUID_EXT_EST | CPUID_EXT_TM2 |
CPUID_EXT_XTPR,
 +CPUID_EXT_DSCPL | CPUID_EXT_EST | CPUID_EXT_TM2 |
CPUID_EXT_XTPR |
 +CPUID_EXT_MOVBE,
  .ext2_features = (PPRO_FEATURES  CPUID_EXT2_AMD_ALIASES) |


-machine pc-1.3 -cpu n270 (and older machine-types) needs to keep
MOVBE disabled, or you will break live migration.

Personally I wouldn't mind declaring n270 as a non-migratable CPU
model.
But libvirt supports n270, so it already expects n270 to expose a
stable
ABI on each machine-type.


  CPUID_EXT2_NX,
  .ext3_features = CPUID_EXT3_LAHF_LM,
 -- 
 1.8.1.2.422.g08c0e7f
 
 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.



Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-31 Thread H. Peter Anvin

On 10/31/2012 12:29 AM, Paolo Bonzini wrote:


Related to this, rdrand's entropy content in the worst case will be
only 1/255th of the data it produces: Intel documents that one 256-bit
seed will result in up to 1022 64-bit random numbers.  Yet, it is good
enough to drive rngd.  Would it make sense for QEMU to implement the
same kind of stretching of /dev/random data, to avoid depleting the
host's entropy pool too fast?



Absolutely not; in fact, we have to do data reduction in rngd for 
exactly this reason (and a Qemu backend would have to do the same). 
There is a new RDSEED instruction in newer CPUs to correct this.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-30 Thread H. Peter Anvin
On 10/30/2012 02:05 AM, Paolo Bonzini wrote:
 Either you're not reading what I wrote, or you're confusing me with
 someone else.

My apologies, you are indeed correct.  I misinterpreted your emails,
probably because I got you confused with someone else.

 I *never* mentioned passing /dev/urandom, and in fact I explained to
 Anthony that it is wrong.  Please take a look at
 http://permalink.gmane.org/gmane.comp.emulators.qemu/178123
 
 What I said that passing /dev/hwrng or rdrand would:
 
 - not make /dev/random with virtio-rng-pci worse than without

It wouldn't, but it would make virtio-rng-pci a potential noop on a
system where it could genuinely do better.

 - make migration working
 
 - avoiding denial of service for the host's /dev/random

However, it means that if there is an rngd-readable source on the host
(e.g. TPM, DRNG) then the guest cannot take advantage of it; if it
accesses /dev/random then it would be able to.  This is particularly
toxic if you turn off DRNG to the host in the name of migration; the
DRNG is a very high bandwidth source which is processed directly by rngd
since there is no point in doing a detour via /dev/hwrng in the kernel.
 As such, with your proposed version you would take one of the best
possible situations and turn it into the worst possible situation.

Furthermore, you are in many ways still causing a DoS on the host, since
you are eating up entropy that would otherwise be fed into /dev/random.
 So there are cases where the situation is much worse with /dev/hwrng
than with /dev/random.

-hpa






Re: [Qemu-devel] [PATCH 0/8] add paravirtualization hwrng support (v2)

2012-10-30 Thread H. Peter Anvin

On 10/30/2012 04:02 PM, Anthony Liguori wrote:


My take away from all of the various discussions on what the Right Way to
use virtio-rng is:

  1) /dev/random should always be used as the entropy source (I've left it
 configurable though)

  2) I think the Right Way to configure virtio-rng is to figure out what the
 available entropy is on the host, and then decide how to allocate that
 to each guest.  As such, I've implemented rate limiting.

 I think QEMU is the right place to do this because this is a property of
 specific virtual machines.  I can imagine a cloud provider wanting to
 guarantee a certain level of entropy for different classes of VMs.  Even
 if rngd could be used to do this, configuring it differently for different
 guests would be cumbersome.



rngd is not where this should happen, it should be in the /dev/random 
implementation in the (host) kernel.  That way it is applicable to all 
types of clients, not just Qemu.



  3) `qemu -device virtio-rng-pci` will Just Work but risks exhausting host
 entropy.  This means we can't make it the default for machines.  But for
 most command line users, I think this is the behavior they want.


It's a bit unfortunate, but I'm not going to push on that point.

Given the migration issue I'll write up an implementation of a DRNG 
(RDRAND/RDSEED) backend once this is upstream.  If RDRAND is disabled in 
the guest, but available in the host, this would be the one to use.  If 
RDRAND is available in the guest it should be used directly if rngd is 
new enough, but since virtio-rng has been in the kernel since 2008 there 
still might be some guests which could use such an implementation 
without having been RDRAND-enabled.


-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-29 Thread H. Peter Anvin

On 10/28/2012 11:23 PM, Amit Shah wrote:

One solution could be to feed host's /dev/urandom to readers of
guests' /dev/urandom.  We could then pass the rare true entropy bits
from host's /dev/hwrng or /dev/random to the guest via
virtio-rng-pci's /dev/hwrng interface in the guest.

If this is a valid idea (host /dev/urandom goes directly to guest's
/dev/urandom), we would need some guest-side surgery, but it shouldn't
be huge work, and would remove several bottlenecks.

Is this a very crazy idea?


It's not crazy, it's just pointless.  You're doing a completely 
unnecessary hypercall to run the PRNG in host space.


-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-29 Thread H. Peter Anvin

On 10/29/2012 01:45 AM, Paolo Bonzini wrote:

First, hwrng is only one of the sources used by rngd.  It can also
(currently) use RDRAND or TPM; additional sources are likely to be added
in the future.

Second, the harvesting of environmental noise -- timings -- is not as
good in a VM as on plain hardware, so for the no-hwrng case it is better
for this to be done in the host than in the VM.


Neither of these make /dev/random with virtio-rng-pci worse than without
(as would be the case if you fed /dev/urandom).  And migration works.



This, and avoiding denial of service for the host's /dev/random, is all
I care about at this time.

There is always time to change defaults to something better.


Your logic are roughly on the level with the people who caused the 
Debian bug.  You are proposing something utterly reckless.  Sorry, 
please stop.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-29 Thread H. Peter Anvin

On 10/29/2012 01:45 AM, Paolo Bonzini wrote:

Il 26/10/2012 22:29, H. Peter Anvin ha scritto:

This is surreal.  Output from /dev/hwrng turns into output for /dev/random... 
it us guaranteed worse; period, end of story.

Isn't that exactly what happens in bare-metal?  hwrng - rngd - random.  
Instead here
we'd have, host hwrng - virtio-rng-pci - guest hwrng - guest rngd - guest 
random.

The only difference is that you paravirtualize access to the host hwrng to a) 
distribute
entropy to multiple guests; b) support migration across hosts with different 
CPUs and
hardware.

First, hwrng is only one of the sources used by rngd.  It can also
(currently) use RDRAND or TPM; additional sources are likely to be added
in the future.

Second, the harvesting of environmental noise -- timings -- is not as
good in a VM as on plain hardware, so for the no-hwrng case it is better
for this to be done in the host than in the VM.


Neither of these make /dev/random with virtio-rng-pci worse than without
(as would be the case if you fed /dev/urandom).  And migration works.
This, and avoiding denial of service for the host's /dev/random, is all
I care about at this time.

There is always time to change defaults to something better.



Let me be more specific.

First of all, feeding /dev/urandom to the guest is dangerous -- you are 
feeding it PRNG contents but telling it that it is real entropy.  This 
is a security hole.


Second of all, you're doing something pointless: you are still 
exhausting the entropy pool on the host at the same rate, and all you 
end up with is something that isn't what you want.  You still have the 
same DoS on the host /dev/random that you're worried about.


Third, you're doing something inefficient: you're running a PRNG in the 
host which could be run more efficiently in guest space.


From an Intel perspective I guess I should be happy, as it functionally 
would mean that unless you have RDRAND in the host you're insecure, but 
I'd much rather see the Right Thing done.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-26 Thread H. Peter Anvin

On 10/26/2012 08:42 AM, Anthony Liguori wrote:


Is /dev/random even appropriate to feed rngd?

rngd needs _a lot_ of entropy to even start working.  Its randomness
test works in groups of 2 bits. On a system without an hardware
RNG, /dev/random can hardly produce 4000 bits/minute.  This means a
guest will not get any entropy boost for 5 minutes after it's started,
even if we allow it to exhaust the parent's entropy.


I don't know, but rng-random is a non-blocking backend so it can handle
/dev/random, /dev/urandom, or /dev/hwrng.



/dev/urandom is just plain *wrong*... it is feeding a PRNG into a PRNG 
which can best be described as masturbation and at worst as a 
cryptographic usage violation.


/dev/hwrng is reasonable, in some ways; after all, the guest itself is 
expected to use rngd.  There are, however, at least two problems:


a) it means that the guest *has* to run rngd or a similar engine; if you 
have control over the guests it might be more efficient to run rngd in 
skip-test mode (I don't think that is currently implemented but it 
could/should be) and centralize all testing to the host.


A skip-test mode would also allow rngd to forward-feed shorter blocks 
than 2500 bytes.


b) if the host has no physical hwrng, /dev/hwrng will output nothing at 
all, which is worse than /dev/random in that situation.



Stefan Berger suggested a backend that uses a PRNG in FreeBL.  That's
probably the best default since it punts to a userspace library to deal
with ensuring there's adequate whitening/entropy to start with.


We SHOULD NOT expose a PRNG here!  It is the same fail as using 
/dev/urandom (but worse)  The whole point is to inject actual entropy... 
a PRNG can (and typically will) just run in guest space.



Maybe rdrand, but that's just a chardev---so why isn't this enough:

   -chardev file,source=on,path=/dev/hwrng,id=chr0  -device 
virtio-rng-pci,file=chr0
   -chardev rdrand,id=chr0  -device 
virtio-rng-pci,file=chr0
   -chardev socket,host=localhost,port=1024,id=chr0 -device 
virtio-rng-pci,rng=chr0,egd=on

(which I suggested in my reply to Amit)?


If you have rdrand you might just use it in the guest directly, unless 
you have a strong reason (migration?) not to do that.  Either way, for 
rdrand you need whitening similar to what rngd is doing (for *rdseed* 
you do not, but rdseed is not shipping yet.)


The startup issue is an interesting problem.  If you have full control 
over the guest it might be best to simply inject some entropy into the 
guest on startup via the initramfs or a disk image; that has its own 
awkwardness too, of course.  The one bit that could potentially be 
solved in Qemu would be an option to don't start the guest until X 
bytes of entropy have been gathered.


Overall, I want to emphasize that we don't want to try solve generic 
problems in virtualization space... resource constraints on /dev/random 
is a generic host OS issue for example.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-26 Thread H. Peter Anvin
PRNGs don't create entropy.  Period.

The guest will run its own PRNG.

Anthony Liguori aligu...@us.ibm.com wrote:

H. Peter Anvin h...@zytor.com writes:

 On 10/26/2012 08:42 AM, Anthony Liguori wrote:

 Is /dev/random even appropriate to feed rngd?

 rngd needs _a lot_ of entropy to even start working.  Its
randomness
 test works in groups of 2 bits. On a system without an hardware
 RNG, /dev/random can hardly produce 4000 bits/minute.  This means a
 guest will not get any entropy boost for 5 minutes after it's
started,
 even if we allow it to exhaust the parent's entropy.

 I don't know, but rng-random is a non-blocking backend so it can
handle
 /dev/random, /dev/urandom, or /dev/hwrng.


 /dev/urandom is just plain *wrong*... it is feeding a PRNG into a
PRNG 
 which can best be described as masturbation and at worst as a 
 cryptographic usage violation.

I don't understand your logic here.

From the discussions I've had, the quality of the randomness from a
*well seeded* PRNG ought to be good enough to act as an entropy source
within the guest.

What qualifies as well seeded is a bit difficult to pin down with more
specificity than kilobytes of data.

I stayed away from /dev/urandom primarily because it's impossible to
determine if it's well seeded or not making urandom dangerous to use.

But using a PRNG makes sense to me when dealing with multiple guests.
If you have a finite source of entropy in the host, using a PRNG to
create unique entropy for each guest is certainly better than
duplicating entropy.

Adding Ted T'so and a few others to CC in hopes that they can chime in
here too.

FWIW, none of this should affect this series being merged as it can use
a variety of different inputs.  But I would like to have a strong
recommendation for what people should use (and make that default) so
I'd
really like to get a clear answer here.

Regards,

Anthony Liguori


 /dev/hwrng is reasonable, in some ways; after all, the guest itself
is 
 expected to use rngd.  There are, however, at least two problems:

 a) it means that the guest *has* to run rngd or a similar engine; if
you 
 have control over the guests it might be more efficient to run rngd
in 
 skip-test mode (I don't think that is currently implemented but it 
 could/should be) and centralize all testing to the host.

 A skip-test mode would also allow rngd to forward-feed shorter blocks

 than 2500 bytes.

 b) if the host has no physical hwrng, /dev/hwrng will output nothing
at 
 all, which is worse than /dev/random in that situation.

 Stefan Berger suggested a backend that uses a PRNG in FreeBL. 
That's
 probably the best default since it punts to a userspace library to
deal
 with ensuring there's adequate whitening/entropy to start with.

 We SHOULD NOT expose a PRNG here!  It is the same fail as using 
 /dev/urandom (but worse)  The whole point is to inject actual
entropy... 
 a PRNG can (and typically will) just run in guest space.

 Maybe rdrand, but that's just a chardev---so why isn't this enough:

-chardev file,source=on,path=/dev/hwrng,id=chr0  -device
virtio-rng-pci,file=chr0
-chardev rdrand,id=chr0  -device
virtio-rng-pci,file=chr0
-chardev socket,host=localhost,port=1024,id=chr0 -device
virtio-rng-pci,rng=chr0,egd=on

 (which I suggested in my reply to Amit)?

 If you have rdrand you might just use it in the guest directly,
unless 
 you have a strong reason (migration?) not to do that.  Either way,
for 
 rdrand you need whitening similar to what rngd is doing (for *rdseed*

 you do not, but rdseed is not shipping yet.)

 The startup issue is an interesting problem.  If you have full
control 
 over the guest it might be best to simply inject some entropy into
the 
 guest on startup via the initramfs or a disk image; that has its own 
 awkwardness too, of course.  The one bit that could potentially be 
 solved in Qemu would be an option to don't start the guest until X 
 bytes of entropy have been gathered.

 Overall, I want to emphasize that we don't want to try solve generic 
 problems in virtualization space... resource constraints on
/dev/random 
 is a generic host OS issue for example.

  -hpa

 -- 
 H. Peter Anvin, Intel Open Source Technology Center
 I work for Intel.  I don't speak on their behalf.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.



Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-26 Thread H. Peter Anvin
This is surreal.  Output from /dev/hwrng turns into output for /dev/random... 
it us guaranteed worse; period, end of story.

I don't know who the agreement is with, but it is ridiculous in this case.

As far as the need whitening issue, I discussed that with Ted at Kernel Summit 
and we came up with an outline for what we can do to improve the current 
situation.  It is challenging to deal with the patanoia crowd at the same time.

Paolo Bonzini pbonz...@redhat.com wrote:

Il 26/10/2012 18:09, H. Peter Anvin ha scritto:
 a) it means that the guest *has* to run rngd or a similar engine; if
you
 have control over the guests it might be more efficient to run rngd
in
 skip-test mode (I don't think that is currently implemented but it
 could/should be) and centralize all testing to the host.
 
 A skip-test mode would also allow rngd to forward-feed shorter blocks
 than 2500 bytes.

That would be something we can communicate via virtio-rng-pci feature
bits.  Perhaps /dev/hwrng could also expose a need-whitening property
in
sysfs.

 b) if the host has no physical hwrng, /dev/hwrng will output nothing
at
 all, which is worse than /dev/random in that situation.

Not really, it would just mean that the guest takes more time to gather
entropy, just like if you had no virtio-rng-pci at all.

 If you have rdrand you might just use it in the guest directly,
unless
 you have a strong reason (migration?) not to do that.  Either way,
for
 rdrand you need whitening similar to what rngd is doing (for *rdseed*
 you do not, but rdseed is not shipping yet.)

Yes, migration is a good reason.

 The startup issue is an interesting problem.  If you have full
control
 over the guest it might be best to simply inject some entropy into
the
 guest on startup via the initramfs or a disk image; that has its own
 awkwardness too, of course.  The one bit that could potentially be
 solved in Qemu would be an option to don't start the guest until X
 bytes of entropy have been gathered.
 
 Overall, I want to emphasize that we don't want to try solve generic
 problems in virtualization space... resource constraints on
/dev/random
 is a generic host OS issue for example.

Yes, but the agreed solution right now is that programs should not ask
for more than 32 bytes or so from /dev/random.  Which means /dev/random
is not a suitable backend for virtio-rng-pci as things stand now.

Paolo

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.



Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-26 Thread H. Peter Anvin
That statement is pretty toxic... I wonder where it came from.  It is at best 
horribly misleading and actively encourages dangerous behaviours even for the 
cases where it isn't actively wrong.

Paolo Bonzini pbonz...@redhat.com wrote:

Il 26/10/2012 21:07, H. Peter Anvin ha scritto:
 This is surreal.  Output from /dev/hwrng turns into output for
/dev/random... it us guaranteed worse; period, end of story.

Isn't that exactly what happens in bare-metal?  hwrng - rngd -
random.  Instead here
we'd have, host hwrng - virtio-rng-pci - guest hwrng - guest rngd -
guest random.

The only difference is that you paravirtualize access to the host hwrng
to a) distribute
entropy to multiple guests; b) support migration across hosts with
different CPUs and
hardware.

 I don't know who the agreement is with, but it is ridiculous in
this case.

man 4 random:

While some safety margin above that minimum is reasonable, as a guard
against
flaws  in the CPRNG algorithm, no cryptographic primitive available
today can
hope to promise more than 256 bits of security, so if any program reads
 more
than  256  bits (32 bytes) from the kernel random pool per invocation,
or per
reasonable reseed interval (not less than one minute), that should  be 
taken
   as a sign that its cryptography is not skilfully implemented.

Paolo

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.



Re: [Qemu-devel] [PATCH 0/6] add paravirtualization hwrng support

2012-10-26 Thread H. Peter Anvin
On 10/26/2012 12:51 PM, Paolo Bonzini wrote:
 Il 26/10/2012 21:07, H. Peter Anvin ha scritto:
 This is surreal.  Output from /dev/hwrng turns into output for 
 /dev/random... it us guaranteed worse; period, end of story.
 
 Isn't that exactly what happens in bare-metal?  hwrng - rngd - random.  
 Instead here
 we'd have, host hwrng - virtio-rng-pci - guest hwrng - guest rngd - guest 
 random.
 
 The only difference is that you paravirtualize access to the host hwrng to a) 
 distribute
 entropy to multiple guests; b) support migration across hosts with different 
 CPUs and
 hardware.

First, hwrng is only one of the sources used by rngd.  It can also
(currently) use RDRAND or TPM; additional sources are likely to be added
in the future.

Second, the harvesting of environmental noise -- timings -- is not as
good in a VM as on plain hardware, so for the no-hwrng case it is better
for this to be done in the host than in the VM.

-hpa



[Qemu-devel] [PATCH] x86: Implement SMEP and SMAP

2012-09-26 Thread H. Peter Anvin
From: H. Peter Anvin h...@linux.intel.com

This patch implements Supervisor Mode Execution Prevention (SMEP) and
Supervisor Mode Access Prevention (SMAP) for x86.  The purpose of the
patch, obviously, is to help kernel developers debug the support for
those features.

A fair bit of the code relates to the handling of CPUID features.  The
CPUID code probably would get greatly simplified if all the feature
bit words were unified into a single vector object, but in the
interest of producing a minimal patch for SMEP/SMAP, and because I had
very limited time for this project, I followed the existing style.

Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 target-i386/cc_helper.c | 10 ++
 target-i386/cpu.c   | 33 -
 target-i386/cpu.h   | 33 +
 target-i386/helper.c| 95 +
 target-i386/helper.h|  2 ++
 target-i386/translate.c | 28 +++
 6 files changed, 164 insertions(+), 37 deletions(-)

diff --git a/target-i386/cc_helper.c b/target-i386/cc_helper.c
index 07892f9..9422003 100644
--- a/target-i386/cc_helper.c
+++ b/target-i386/cc_helper.c
@@ -353,6 +353,16 @@ void helper_sti(CPUX86State *env)
 env-eflags |= IF_MASK;
 }
 
+void helper_clac(CPUX86State *env)
+{
+env-eflags = ~AC_MASK;
+}
+
+void helper_stac(CPUX86State *env)
+{
+env-eflags |= AC_MASK;
+}
+
 #if 0
 /* vm86plus instructions */
 void helper_cli_vm(CPUX86State *env)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 423e009..56d0977 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -100,6 +100,13 @@ static const char *svm_feature_name[] = {
 NULL, NULL, NULL, NULL,
 };
 
+static const char *cpuid_7_0_ebx_feature_name[] = {
+NULL, NULL, NULL, NULL, NULL, NULL, NULL, smep,
+NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL, smap, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+};
+
 /* collects per-function cpuid data
  */
 typedef struct model_features_t {
@@ -215,14 +222,16 @@ static void add_flagname_to_bitmaps(const char *flagname, 
uint32_t *features,
 uint32_t *ext2_features,
 uint32_t *ext3_features,
 uint32_t *kvm_features,
-uint32_t *svm_features)
+uint32_t *svm_features,
+uint32_t *cpuid_7_0_ebx_features)
 {
 if (!lookup_feature(features, flagname, NULL, feature_name) 
 !lookup_feature(ext_features, flagname, NULL, ext_feature_name) 
 !lookup_feature(ext2_features, flagname, NULL, ext2_feature_name) 
 !lookup_feature(ext3_features, flagname, NULL, ext3_feature_name) 
 !lookup_feature(kvm_features, flagname, NULL, kvm_feature_name) 
-!lookup_feature(svm_features, flagname, NULL, svm_feature_name))
+!lookup_feature(svm_features, flagname, NULL, svm_feature_name) 
+!lookup_feature(cpuid_7_0_ebx_features, flagname, NULL, 
cpuid_7_0_ebx_feature_name))
 fprintf(stderr, CPU feature %s not found\n, flagname);
 }
 
@@ -285,6 +294,7 @@ typedef struct x86_def_t {
 #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
   CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A)
 #define TCG_SVM_FEATURES 0
+#define TCG_7_0_EBX_FEATURES (CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_SMAP)
 
 /* maintains list of cpu model definitions
  */
@@ -295,7 +305,7 @@ static x86_def_t *x86_defs = {NULL};
 static x86_def_t builtin_x86_defs[] = {
 {
 .name = qemu64,
-.level = 4,
+.level = 7,
 .vendor1 = CPUID_VENDOR_AMD_1,
 .vendor2 = CPUID_VENDOR_AMD_2,
 .vendor3 = CPUID_VENDOR_AMD_3,
@@ -310,6 +320,7 @@ static x86_def_t builtin_x86_defs[] = {
 CPUID_EXT2_LM | CPUID_EXT2_SYSCALL | CPUID_EXT2_NX,
 .ext3_features = CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM |
 CPUID_EXT3_ABM | CPUID_EXT3_SSE4A,
+.cpuid_7_0_ebx_features = CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_SMAP,
 .xlevel = 0x800A,
 },
 {
@@ -873,10 +884,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 uint32_t plus_features = 0, plus_ext_features = 0;
 uint32_t plus_ext2_features = 0, plus_ext3_features = 0;
 uint32_t plus_kvm_features = 0, plus_svm_features = 0;
+uint32_t plus_7_0_ebx_features = 0;
 /* Features to be removed */
 uint32_t minus_features = 0, minus_ext_features = 0;
 uint32_t minus_ext2_features = 0, minus_ext3_features = 0;
 uint32_t minus_kvm_features = 0, minus_svm_features = 0;
+uint32_t minus_7_0_ebx_features = 0;
 uint32_t numvalue;
 
 for (def = x86_defs; def; def = def-next)
@@ -903,8 +916,8 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 #endif
 
 add_flagname_to_bitmaps(hypervisor

Re: [Qemu-devel] [PATCH] x86: Implement SMEP and SMAP

2012-09-26 Thread H. Peter Anvin
On 09/26/2012 12:50 PM, Anthony Liguori wrote:
 
 The patch looks good except for these two chunks.  This would break live
 migration from a new QEMU to an old one because CPUs are currently not
 versioned.
 
 If you just remove these two chunks, the patch can be applied and you
 can still test with:
 
 qemu-system-x86_64 -cpu qemu64,+smep,+smap
 
 We're working vcpu versioning and will hopefully have it in place for
 1.3.  If we get there, we can enable these features unconditionally and
 add the appropriate backwards compatibility code.
 

OK.  I had to add a chunk to up the minimum CPUID level to 7 if any of
the level 7 features were enabled, however.
 
 scripts/checkpatch.pl will complain about lack of curly braces with ifs.
 
 I know this file does not use them consistently.  Since you have to
 respin anyway, please run checkpatch and fixup the errors.
 


Actually, several of the files are very consistent about *not* using
them.  I'll respin the patch, but it ain't pretty.

-hpa



[Qemu-devel] [PATCH v2] x86: Implement SMEP and SMAP

2012-09-26 Thread H. Peter Anvin
From: H. Peter Anvin h...@linux.intel.com

This patch implements Supervisor Mode Execution Prevention (SMEP) and
Supervisor Mode Access Prevention (SMAP) for x86.  The purpose of the
patch, obviously, is to help kernel developers debug the support for
those features.

A fair bit of the code relates to the handling of CPUID features.  The
CPUID code probably would get greatly simplified if all the feature
bit words were unified into a single vector object, but in the
interest of producing a minimal patch for SMEP/SMAP, and because I had
very limited time for this project, I followed the existing style.

[ v2: don't change the definition of the qemu64 CPU shorthand, since
  that breaks loading old snapshots.  Per Anthony Liguori this can be
  fixed once the CPU feature set is snapshot.

  Change the coding style slightly to conform to checkpatch.pl. ]

Signed-off-by: H. Peter Anvin h...@linux.intel.com
---
 target-i386/cc_helper.c |   10 +++
 target-i386/cpu.c   |   34 ---
 target-i386/cpu.h   |   33 --
 target-i386/helper.c|  150 ++-
 target-i386/helper.h|2 +
 target-i386/translate.c |   27 +++--
 6 files changed, 207 insertions(+), 49 deletions(-)

diff --git a/target-i386/cc_helper.c b/target-i386/cc_helper.c
index 07892f9..9422003 100644
--- a/target-i386/cc_helper.c
+++ b/target-i386/cc_helper.c
@@ -353,6 +353,16 @@ void helper_sti(CPUX86State *env)
 env-eflags |= IF_MASK;
 }
 
+void helper_clac(CPUX86State *env)
+{
+env-eflags = ~AC_MASK;
+}
+
+void helper_stac(CPUX86State *env)
+{
+env-eflags |= AC_MASK;
+}
+
 #if 0
 /* vm86plus instructions */
 void helper_cli_vm(CPUX86State *env)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index fd4fe28..f186439 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -100,6 +100,13 @@ static const char *svm_feature_name[] = {
 NULL, NULL, NULL, NULL,
 };
 
+static const char *cpuid_7_0_ebx_feature_name[] = {
+NULL, NULL, NULL, NULL, NULL, NULL, NULL, smep,
+NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL, smap, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+};
+
 /* collects per-function cpuid data
  */
 typedef struct model_features_t {
@@ -215,14 +222,17 @@ static void add_flagname_to_bitmaps(const char *flagname, 
uint32_t *features,
 uint32_t *ext2_features,
 uint32_t *ext3_features,
 uint32_t *kvm_features,
-uint32_t *svm_features)
+uint32_t *svm_features,
+uint32_t *cpuid_7_0_ebx_features)
 {
 if (!lookup_feature(features, flagname, NULL, feature_name) 
 !lookup_feature(ext_features, flagname, NULL, ext_feature_name) 
 !lookup_feature(ext2_features, flagname, NULL, ext2_feature_name) 
 !lookup_feature(ext3_features, flagname, NULL, ext3_feature_name) 
 !lookup_feature(kvm_features, flagname, NULL, kvm_feature_name) 
-!lookup_feature(svm_features, flagname, NULL, svm_feature_name))
+!lookup_feature(svm_features, flagname, NULL, svm_feature_name) 
+!lookup_feature(cpuid_7_0_ebx_features, flagname, NULL,
+cpuid_7_0_ebx_feature_name))
 fprintf(stderr, CPU feature %s not found\n, flagname);
 }
 
@@ -284,6 +294,7 @@ typedef struct x86_def_t {
 #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
   CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A)
 #define TCG_SVM_FEATURES 0
+#define TCG_7_0_EBX_FEATURES (CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_SMAP)
 
 /* maintains list of cpu model definitions
  */
@@ -1091,10 +1102,12 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 uint32_t plus_features = 0, plus_ext_features = 0;
 uint32_t plus_ext2_features = 0, plus_ext3_features = 0;
 uint32_t plus_kvm_features = 0, plus_svm_features = 0;
+uint32_t plus_7_0_ebx_features = 0;
 /* Features to be removed */
 uint32_t minus_features = 0, minus_ext_features = 0;
 uint32_t minus_ext2_features = 0, minus_ext3_features = 0;
 uint32_t minus_kvm_features = 0, minus_svm_features = 0;
+uint32_t minus_7_0_ebx_features = 0;
 uint32_t numvalue;
 
 for (def = x86_defs; def; def = def-next)
@@ -1121,8 +1134,8 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 #endif
 
 add_flagname_to_bitmaps(hypervisor, plus_features,
-plus_ext_features, plus_ext2_features, plus_ext3_features,
-plus_kvm_features, plus_svm_features);
+plus_ext_features, plus_ext2_features, plus_ext3_features,
+plus_kvm_features, plus_svm_features,  plus_7_0_ebx_features);
 
 featurestr = strtok(NULL, ,);
 
@@ -1132,12 +1145,12 @@ static int cpu_x86_find_by_name(x86_def_t

Re: [Qemu-devel] [PATCH 07/22] target-i386: convert cpuid features into properties

2012-09-26 Thread H. Peter Anvin
On 09/26/2012 01:32 PM, Igor Mammedov wrote:
 +
 +static void x86_cpuid_get_feature(Object *obj, Visitor *v, void *opaque,
 + const char *name, Error **errp)
 +{
 +X86CPU *cpu = X86_CPU(obj);
 +CPUX86State *env = cpu-env;
 +bool value = true;
 +
 +if (!is_feature_set(name, env-cpuid_features, feature_name) 
 +   !is_feature_set(name, env-cpuid_ext_features, ext_feature_name) 
 +   !is_feature_set(name, env-cpuid_ext2_features, ext2_feature_name) 
 +   !is_feature_set(name, env-cpuid_ext3_features, ext3_feature_name) 
 +   !is_feature_set(name, env-cpuid_kvm_features, kvm_feature_name) 
 +   !is_feature_set(name, env-cpuid_svm_features, svm_feature_name)) {
 +value = false;
 +}
 +

If you're going to do a full-blown restructuring of the CPUID handling,
how about actually turning this into a composite object, specifically an
array, instead of this insane collection of arguments (which needs to be
augmented)?

-hpa




Re: [Qemu-devel] [PATCH v3 0/1] virtio-rng: hardware random number generator

2012-09-16 Thread H. Peter Anvin

On 06/19/2012 11:59 PM, Amit Shah wrote:

Hello,

Here's the 3rd iteration of the virtio-rng device.  This update just
rebases the patch on top of current master.

Details on the patch in the commit message.



Hi everyone...

I just stumbled on this patchset after realizing that the virtio-rng 
support still is not even in the Qemu git tree even though the kernel 
side has been there for four years(!)


It seems this support has been stuck in overengineering hell for 
years.  I have to admit to having a bit of a confusion about what is so 
hard about reading from a file descriptor, which may return partial 
reads.  I understand the fairness argument, but I would argue that if it 
is an actual problem should be solved in the kernel and therefore 
benefit all types of applications rather than at the application level 
(which Qemu, effectively, is.)


However, one key error I spotted was using /dev/urandom.  Using 
/dev/urandom is a very serious cryptographic error, as well as 
completely pointless.


The whole point of this is to provided distilled entropy to the guest, 
so that it can seed true entropy into *its* entropy pool.  As such, 
using /dev/urandom is:


a) needlessly slow.  It will churn the host kernel PRNG needlessly,
   and not provide any backpressure when the host pool is already
   drained.  Using /dev/random instead will indicate that the host
   pool is drained, and not pass a bunch of PRNG noise across an
   expensive channel when the PRNG in the *guest* could do the PRNG
   expansion just as well.

b) cryptographically incorrect.  This will tell the guest that it
   is being provided with entropy that it doesn't actually have, which
   will mean the guest severely overestimates the entropy that it has
   available to it.  To put it differently: /dev/random in the guest
   will behave like (a very expensive version of) /dev/urandom from
   a cryptographic point of view.

The right interface to the host kernel, therefore, is /dev/random.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] [PATCH v3 0/1] virtio-rng: hardware random number generator

2012-09-16 Thread H. Peter Anvin

On 09/16/2012 04:23 PM, Anthony Liguori wrote:

This is *exactly* what the problem is.

If using /dev/urandom is pointless--and so far, many people have made
compelling arguments that it is--then using /dev/random is seemingly
impossible to do fairly.


It is not merely pointless, it is a security hole.


The virtio-rng interface doesn't have any notion of guarantee of entropy
availability.  The guest just keeps requesting entropy with no
indication to the hypervisor of what it should and shouldn't give.


With the latest fixes to rngd this is no longer true.  This *was* a bug 
in rngd  4; it would always claim to be entropy-starved (and so a guest 
running rngd  4 will have this pathology, but no distro is known to run 
rngd  4 by default due to a number of other problems with these 
versions of rngd).  This has been fixed.  If that backpressure isn't 
propagated through the virtio-rng driver then that does need to be 
fixed, but from at least a cursory look I think it does.



Since /dev/random is a finite pool, it's quite possible that one guest
could quickly exhaust /dev/random for all other guests.  How is this not
a clear denial of service?


Well, by that definition the fact that the service hasn't been provided 
at all until now is a bigger denial of service...



This is why supporting EGD is so important IMHO.  Something else needs
to deal with handing out entropy in a responsible way.


No, you're missing the key point.  Fixing this in applications (and from 
the host kernel's perspective, this is an application) is broken, 
because it is not just Qemu that has that property.  If this is an 
actual problem (and it's not clear to me that it is in anything but 
theory, because although the kernel doesn't do round robin it will at 
least provide amount of stochastic fairness) then it is in the kernel 
that the fix belongs.


Throwing an extra daemon -- one which doesn't even claim to be designed 
for this purpose -- into the middle is just silly.  Either which way, it 
can easily be handled via a sane fairness daemon which doesn't need a 
complicated protocol.



Anyway, this is on my list for 1.3.  The series just needs some light
dusting before resubmission.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [Qemu-devel] [PATCH v3 0/1] virtio-rng: hardware random number generator

2012-09-16 Thread H. Peter Anvin
Right, obviously.  However, under no circumstances should /dev/urandom be used!

Amit Shah amit.s...@redhat.com wrote:

On (Sun) 16 Sep 2012 [13:42:46], H. Peter Anvin wrote:
 On 06/19/2012 11:59 PM, Amit Shah wrote:
 Hello,
 
 Here's the 3rd iteration of the virtio-rng device.  This update just
 rebases the patch on top of current master.
 
 Details on the patch in the commit message.
 
 
 Hi everyone...
 
 I just stumbled on this patchset after realizing that the virtio-rng
 support still is not even in the Qemu git tree even though the
 kernel side has been there for four years(!)
 
 It seems this support has been stuck in overengineering hell for
 years.  I have to admit to having a bit of a confusion about what is
 so hard about reading from a file descriptor, which may return
 partial reads.  I understand the fairness argument, but I would
 argue that if it is an actual problem should be solved in the kernel
 and therefore benefit all types of applications rather than at the
 application level (which Qemu, effectively, is.)
 
 However, one key error I spotted was using /dev/urandom.  Using
 /dev/urandom is a very serious cryptographic error, as well as
 completely pointless.
 
 The whole point of this is to provided distilled entropy to the
 guest, so that it can seed true entropy into *its* entropy pool.  As
 such, using /dev/urandom is:
 
 a) needlessly slow.  It will churn the host kernel PRNG needlessly,
and not provide any backpressure when the host pool is already
drained.  Using /dev/random instead will indicate that the host
pool is drained, and not pass a bunch of PRNG noise across an
expensive channel when the PRNG in the *guest* could do the PRNG
expansion just as well.
 
 b) cryptographically incorrect.  This will tell the guest that it
is being provided with entropy that it doesn't actually have,
which
will mean the guest severely overestimates the entropy that it has
available to it.  To put it differently: /dev/random in the guest
will behave like (a very expensive version of) /dev/urandom from
a cryptographic point of view.
 
 The right interface to the host kernel, therefore, is /dev/random.

Agreed with your points -- /dev/random on the host itself could be fed
in via a real hwrng.  Also, for the fairness arguments, several guests
doing IO also contribute to the random pool.

The ideal interface, though, in qemu should be sourcing entropy from
a file descriptor, and the admin chooses what he wants to source
entropy from - /dev/random, /dev/urandom, or even /dev/hwrng.

   Amit

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.



Re: [Qemu-devel] x86, nops settings result in kernel crash

2012-08-16 Thread H. Peter Anvin
On 08/16/2012 11:53 AM, Alan Cox wrote:

 Yes, if I remove the break statement (introduced by this commit), it works 
 fine.
 
 What version of qemu is this - do we have qemu bug here I wonder.
 

Also, is it 32 or 64 bits?

-hpa






[Qemu-devel] Re: [PATCH 5/6] Add support for a USB audio device model

2010-10-28 Thread H. Peter Anvin

On 10/28/2010 1:01 AM, Gerd Hoffmann wrote:

On 10/27/10 19:13, H. Peter Anvin wrote:

On 10/27/2010 9:04 AM, Gerd Hoffmann wrote:



This brings a usb audio device to qemu. Output only, fixed at
16bit stereo @ 48 Hz. Based on a patch from
H. Peter Anvinh...@linux.intel.com



Please don't apply this. This patch is outdated and has several glaring
bugs. I can send a better patch later (I'm in an all-day meeting with
insanely bad networking.)


Oh, I thought you've abandoned the project ...

Note that I've update the patch too and fixed at least some of those
bugs. Nevertheless I'll be happy to see your latest bits and try to
merge the best of both versions.



I have a git tree on kernel.org with the latest version.  However, the 
rate-matching code needs to be ripped out since it just plain doesn't 
work, and tends to produce worse results than no rate matching:


http://git.kernel.org/?p=virt/qemu/hpa/qemu-usb-audio-wip.git;a=shortlog;h=refs/heads/usb-audio

-hpa



[Qemu-devel] Re: [PATCH 5/6] Add support for a USB audio device model

2010-10-28 Thread H. Peter Anvin

On 10/28/2010 1:01 AM, Gerd Hoffmann wrote:

On 10/27/10 19:13, H. Peter Anvin wrote:

On 10/27/2010 9:04 AM, Gerd Hoffmann wrote:



This brings a usb audio device to qemu. Output only, fixed at
16bit stereo @ 48 Hz. Based on a patch from
H. Peter Anvinh...@linux.intel.com



Please don't apply this. This patch is outdated and has several glaring
bugs. I can send a better patch later (I'm in an all-day meeting with
insanely bad networking.)


Oh, I thought you've abandoned the project ...

Note that I've update the patch too and fixed at least some of those
bugs. Nevertheless I'll be happy to see your latest bits and try to
merge the best of both versions.

cheers,
Gerd


Grmf ... looks like I have multiple pieces of code in multiple places... 
I need to reconcile the best version I have by hand first.

-hpa




[Qemu-devel] Re: [PATCH 5/6] Add support for a USB audio device model

2010-10-28 Thread H. Peter Anvin

On 10/28/2010 1:01 AM, Gerd Hoffmann wrote:

On 10/27/10 19:13, H. Peter Anvin wrote:

On 10/27/2010 9:04 AM, Gerd Hoffmann wrote:



This brings a usb audio device to qemu. Output only, fixed at
16bit stereo @ 48 Hz. Based on a patch from
H. Peter Anvinh...@linux.intel.com



Please don't apply this. This patch is outdated and has several glaring
bugs. I can send a better patch later (I'm in an all-day meeting with
insanely bad networking.)


Oh, I thought you've abandoned the project ...

Note that I've update the patch too and fixed at least some of those
bugs. Nevertheless I'll be happy to see your latest bits and try to
merge the best of both versions.


I'm going to have to look at this later... I'm in a F2F meeting most of 
the day today.


-hpa




Re: [Qemu-devel] Hitting 29 NIC limit

2010-10-18 Thread H. Peter Anvin
On 10/14/2010 05:57 AM, Anthony Liguori wrote:
 
 I've always been sceptical of this.  When physical systems have a large 
 number of NICs, it's via multiple functions, not a bunch of PCI bridges.
 

Actually a lot of multiport PCI cards are in fact single or dual NICs
behind PCI bridges.

-hpa



[Qemu-devel] Re: [PATCH] [RFC] Add support for a USB audio device model

2010-10-14 Thread H. Peter Anvin
On 10/14/2010 06:51 AM, Mike Snitzer wrote:
 
 Was just wondering if you've been able to put some time to the
 rate-matching issues?
 
 Has this usb-audio patch evolved and I'm just missing it?
 
 Thanks for doing this work!
 Mike
 

The sad result really is: it doesn't work, and it probably will never work.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-13 Thread H. Peter Anvin
On 10/12/2010 12:06 PM, Gleb Natapov wrote:

 This is true to some extent -- there is some standard content, and some
 further can be described via ACPI tables.  However, my point was mostly
 that it is an existing model for nonvolatile storage which also works on
 hardware (and is vastly simpler albeit smaller in size than ESCD).

 And my point is why would we want nonvolatile storage for BIOS settings
 in qemu. It doesn't provide anything that can't be done through command
 line and configured nicely by virt-manager and it introduces one more
 file to carry around with your VM. And if the idea is to create it on
 the fly then it is no longer nonvolatile and is not better then fw_cfg.
 If we want nonvolatile storage for some reason I would agree that CMOS
 is good candidate for that except of its size. How much can you fit into
 128 byte? Less then one tweet.

128 bytes isn't a hard limit; the original PC/AT actually had only 64
bytes, but the standard interface allows 128 bytes.  However, there is
a semi-standard (common extension) interface using ports 72/73 which
allows 256 bytes, and quite a few chipsets have added further extensions.

The ACPI specification recognizes three interfaces as standard: PC/AT
(64 bytes, even though 128 bytes is available on a lot of platforms),
PIIX4 (256 bytes), and Dallas Semiconductor (256 bytes or more).  The
interface for the latter isn't well cited in the ACPI spec, but I'm
guessing this is referring to the DS17885 series of chips, which can
have up to 8K CMOS using a bank-switched scheme which presents 128 bytes
at a time (thus accessible via only the standard 70/71 ports.)

-hpa



Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-13 Thread H. Peter Anvin
On 10/13/2010 12:17 PM, H. Peter Anvin wrote:
 
 The ACPI specification recognizes three interfaces as standard: PC/AT
 (64 bytes, even though 128 bytes is available on a lot of platforms),
 PIIX4 (256 bytes), and Dallas Semiconductor (256 bytes or more).  The
 interface for the latter isn't well cited in the ACPI spec, but I'm
 guessing this is referring to the DS17885 series of chips, which can
 have up to 8K CMOS using a bank-switched scheme which presents 128 bytes
 at a time (thus accessible via only the standard 70/71 ports.)
 

FWIW, the DS17885 scheme actually allows addressing up to 64K; 8K is the
maximum that DS produced with this particular interface as far as I
know, but there are 16 address bits available.

-hpa



Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-13 Thread H. Peter Anvin
On 10/13/2010 01:00 PM, H. Peter Anvin wrote:
 On 10/13/2010 12:17 PM, H. Peter Anvin wrote:

 The ACPI specification recognizes three interfaces as standard: PC/AT
 (64 bytes, even though 128 bytes is available on a lot of platforms),
 PIIX4 (256 bytes), and Dallas Semiconductor (256 bytes or more).  The
 interface for the latter isn't well cited in the ACPI spec, but I'm
 guessing this is referring to the DS17885 series of chips, which can
 have up to 8K CMOS using a bank-switched scheme which presents 128 bytes
 at a time (thus accessible via only the standard 70/71 ports.)

 
 FWIW, the DS17885 scheme actually allows addressing up to 64K; 8K is the
 maximum that DS produced with this particular interface as far as I
 know, but there are 16 address bits available.
 
   -hpa

I think this is the relevant application note:

http://www.maxim-ic.com/app-notes/index.mvp/id/77

-hpa



Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-12 Thread H. Peter Anvin
On 10/12/2010 01:01 AM, Gleb Natapov wrote:
 On Mon, Oct 11, 2010 at 02:15:26PM -0700, H. Peter Anvin wrote:
 I don't disagree.

 I think the best thing to do is to let SeaBIOS create a boot order table 
 that contains descriptive information and then advertise that to QEMU.

 QEMU can then try to associate the list of bootable devices with it's 
 own set of devices and select a preferred order that it can then give 
 back to SeaBIOS.  SeaBIOS can then present that list to the user for 
 additional refinement.

 Really, this kind of comes down to having a data structure that anything
 (Qemu, SeaBIOS and if needed the guest OS) can read and modify as needed.

 But then QEMU and seabios will have to have shared storage they can
 both write too. And this shared storage is part of VM now so you need
 to carry it around when you move your VM elsewhere.
  

Yes, and it's part of real hardware, too.  It's usually called the
CMOS, short for CMOS RAM.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.




Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-12 Thread H. Peter Anvin
On real hardware it is shared between BIOS and the OS, actually.

Gleb Natapov g...@redhat.com wrote:

On Tue, Oct 12, 2010 at 09:33:16AM -0700, H. Peter Anvin wrote:
 On 10/12/2010 01:01 AM, Gleb Natapov wrote:
  On Mon, Oct 11, 2010 at 02:15:26PM -0700, H. Peter Anvin wrote:
  I don't disagree.
 
  I think the best thing to do is to let SeaBIOS create a boot order table 
  that contains descriptive information and then advertise that to QEMU.
 
  QEMU can then try to associate the list of bootable devices with it's 
  own set of devices and select a preferred order that it can then give 
  back to SeaBIOS.  SeaBIOS can then present that list to the user for 
  additional refinement.
 
  Really, this kind of comes down to having a data structure that anything
  (Qemu, SeaBIOS and if needed the guest OS) can read and modify as needed.
 
  But then QEMU and seabios will have to have shared storage they can
  both write too. And this shared storage is part of VM now so you need
  to carry it around when you move your VM elsewhere.
   
 
 Yes, and it's part of real hardware, too.  It's usually called the
 CMOS, short for CMOS RAM.
 
On real hardware it is not shared between HW and bios. It is
written/read only by BIOS. In qemu it is not persistent and generated
for each qemu invocation. Previously it was used to pass config params
from qemu to a bios (and some legacy params are still passed that way),
but we moved to better interface for that (firmware config).

--
   Gleb.

-- 
Sent from my mobile phone.  Please pardon any lack of formatting.



Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-12 Thread H. Peter Anvin
On 10/12/2010 10:41 AM, Gleb Natapov wrote:
 On Tue, Oct 12, 2010 at 10:35:51AM -0700, H. Peter Anvin wrote:
 On real hardware it is shared between BIOS and the OS, actually.

 Guest OS can write in qemu CMOS too. But what is it useful for? Most of
 its content is not standard AFAIK.
 

This is true to some extent -- there is some standard content, and some
further can be described via ACPI tables.  However, my point was mostly
that it is an existing model for nonvolatile storage which also works on
hardware (and is vastly simpler albeit smaller in size than ESCD).

-hpa



Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-11 Thread H. Peter Anvin
On 10/11/2010 12:51 PM, Anthony Liguori wrote:
 
 -kernel hijacks int19 so it cannot participate in any kind of boot 
 order.  It's either present (and therefore the bootable disk) or not 
 present.
 

That's a misdesign, though: it should be able to participate in BBS as a
BEV.

-hpa



Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-11 Thread H. Peter Anvin
On 10/11/2010 01:30 PM, Anthony Liguori wrote:
 On 10/11/2010 02:59 PM, Gleb Natapov wrote:
 No boot rom should do that. extboot wreaks havoc when it is used.
 And since virtio is now supported by bios there is no reason to use it.
 
 You don't really have a choice.  You could be doing hardware passthrough 
 and the ROM on the card may hijack int19.

The BBS standard actually documents how to deal with that -- it pretty
much works out to let the card initialize, then see if it mucked with
int19, and then put int19 back... if we want to run that card, then we
invoke the int19 that the card set up.

 Whoever needs scsi boot should add it to seabios too.

 
 I don't disagree.
 
 I think the best thing to do is to let SeaBIOS create a boot order table 
 that contains descriptive information and then advertise that to QEMU.
 
 QEMU can then try to associate the list of bootable devices with it's 
 own set of devices and select a preferred order that it can then give 
 back to SeaBIOS.  SeaBIOS can then present that list to the user for 
 additional refinement.

Really, this kind of comes down to having a data structure that anything
(Qemu, SeaBIOS and if needed the guest OS) can read and modify as needed.

-hpa



Re: [SeaBIOS] [Qemu-devel] [RFC] Passing boot order from qemu to seabios

2010-10-11 Thread H. Peter Anvin
On 10/11/2010 02:41 PM, Sebastian Herbszt wrote:
 H. Peter Anvin wrote:
 On 10/11/2010 01:30 PM, Anthony Liguori wrote:
 On 10/11/2010 02:59 PM, Gleb Natapov wrote:
 No boot rom should do that. extboot wreaks havoc when it is used.
 And since virtio is now supported by bios there is no reason to use it.

 You don't really have a choice.  You could be doing hardware passthrough 
 and the ROM on the card may hijack int19.

 The BBS standard actually documents how to deal with that -- it pretty
 much works out to let the card initialize, then see if it mucked with
 int19, and then put int19 back... if we want to run that card, then we
 invoke the int19 that the card set up.
 
 The BIOS Boot Specification, Version 1.01 from January 11, 1996 seems not to
 recommend this:
 
 3.4 Legacy IPL Devices
 
 Legacy IPL devices will be allowed to take control of the system (via hooking
 interrupts) in both Legacy and PnP systems. The Plug and Play BIOS 
 specification
 recommends that Legacy devices that hook a bootstrap interrupt such as INT 
 19h, 18h,
 or 13h have the interrupt re-captured by the BIOS. This is not done because 
 grabbing
 an interrupt vector back after a device has hooked it can produce 
 unpredictable results.
 Further, by allowing the card to take control, the behavior of these Legacy 
 cards will
 be the same on both PnP and Legacy machines.
 
 6.8 Notes on the POST Process
 
 The Plug and Play BIOS Specification says that if a Legacy IPL device's 
 option
 ROM captures INT 18h or INT 19h, the BIOS should save this vector and then
 restore the original one put there by the BIOS. The BIOS Boot Specification
 deviates from this in that these vectors are not recaptured after each Legacy 
 option
 ROM returns from initialization. That would be considered unsafe.
 

Sorry, you're right -- I confused the PNPBIOS spec with the BBS spec
(and compounded the error by correctly remembering that BBS overrides
PNPBIOS).

-hpa



Re: [Qemu-devel] [PATCH] add 40-48 bit RAM range to seabios

2010-09-17 Thread H. Peter Anvin
On 09/17/2010 06:11 AM, Kevin O'Connor wrote:
 On Fri, Sep 17, 2010 at 07:53:12AM -0500, Anthony Liguori wrote:
 On 09/16/2010 08:47 PM, Kevin O'Connor wrote:
 On Wed, Sep 15, 2010 at 07:15:28PM +0200, Andrea Arcangeli wrote:
 This uses a new cmos port at 0x5e that shall read zero to be backwards
 compatible.
 It looks okay to me.  Can you provide an Acked-by from one of the qemu
 or kvm maintainers?

 Is CMOS the best place to communicate this (as opposed to fw_cfg)?

 I know we currently expose memory size via CMOS but perhaps it's
 better to switch to a 64-bit fw_cfg value.
 
 I'd personally prefer fw_cfg.
 
 Also, another possibility would be to stop sending an absolute max and
 to instead send a map of memory.  The latter would allow for
 non-contiguous memory.
 

That would be highly useful for testing purposes.

-hpa




[Qemu-devel] Re: [PATCH] [RFC] Add support for a USB audio device model

2010-09-13 Thread H. Peter Anvin
On 09/13/2010 01:53 PM, Amos Kong wrote:
 
 # patch -p1  /tmp/usb-audio.patch
 # ./configure
 ...
 ...
 preadv supportyes
 fdatasync yes
 uuid support  no
 vhost-net support no
 Trace backend nop
 Trace output file trace-pid
 ./configure: 2276: Bad substitution
 

What shell is your /bin/sh?

-hpa



[Qemu-devel] Re: [PATCH] [RFC] Add support for a USB audio device model

2010-09-13 Thread H. Peter Anvin
On 09/13/2010 01:53 PM, Amos Kong wrote:
 
 # patch -p1  /tmp/usb-audio.patch
 # ./configure
 ...
 ...
 preadv supportyes
 fdatasync yes
 uuid support  no
 vhost-net support no
 Trace backend nop
 Trace output file trace-pid
 ./configure: 2276: Bad substitution
 
 
 diff --git a/create_config b/create_config
 index 0098e68..1caa25b 100755
 --- a/create_config
 +++ b/create_config
 @@ -25,7 +25,7 @@ case $line in
   CONFIG_AUDIO_DRIVERS=*)
  drivers=${line#*=}
  echo #define CONFIG_AUDIO_DRIVERS \\
 -for drv in $drivers; do
 +for drv in ${drivers//-/_}; do
echo ${drv}_audio_driver,\\
  done
  echo 
 @@ -39,10 +39,12 @@ case $line in
  ;;
   CONFIG_*=y) # configuration
  name=${line%=*}
 +name=${name//-/_}
  echo #define $name 1
  ;;
   CONFIG_*=*) # configuration
  name=${line%=*}
 +name=${name//-/_}
  value=${line#*=}
  echo #define $name $value
  ;;

Looks like ${.../...} is a bashism.

One can replace it with:

name=`echo $name | tr '-' '_'`

and

for drv in `echo $drivers | tr '-' '_'`; do

-hpa



  1   2   >