Re: [Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread David Woodhouse
On Wed, 2017-08-02 at 09:16 -0600, Jan Beulich wrote:
> 
> Well, I've seen breakage in all sorts of places I wouldn't have expected
> anyone to fine a need to fiddle with.

This is the nature of 'value subtract', I suppose. 

How about something like this? Somewhat complicated by the fact that
COFF section names get truncated, so maybe there's a nicer place to put
it (or maybe we explicitly include .init.da into the .init.data output
section, in the efi.lds linker script, or something?)

--- a/xen/arch/x86/efi/mkreloc.c
+++ b/xen/arch/x86/efi/mkreloc.c
@@ -346,6 +346,15 @@ int main(int argc, char *argv[])
  memcmp(sec1[i].name, ".lockpro", sizeof(sec1[i].name)) == 0 )
 continue;
 
+   /* For sections with relocations, force them to be writeable */
+   if (memcmp(sec1[i].name, ".init.da", sizeof(sec1[i].name)) == 0)
+   printf(".pushsection .init.data, \"awx\"\n");
+   else if (memcmp(sec1[i].name, ".init.te", sizeof(sec1[i].name) ) == 0)
+   printf(".pushsection .init.text, \"awx\"\n");
+   else
+   printf(".pushsection %.*s, \"awx\"\n", 
(int)sizeof(sec1[i].name), sec1[i].name);
+   printf(".popsection\n");
+
 if ( !sec1[i].rva )
 {
 fprintf(stderr, "Can't handle section %u with zero RVA\n", i);

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread David Woodhouse
On Wed, 2017-08-02 at 07:58 -0600, Jan Beulich wrote:
> > > > David Woodhouse <dw...@infradead.org> 08/02/17 2:11 PM >>>
> > This change is sufficient (we believe) to make EFI builds of Xen
> > actually boot again on current EDK2, is it not?
>
> It is safe / sufficient only with the specific behavior you've observed, i.e.
> when permission restrictions are being removed during ExitBootServices().
> I don't recall having seen any statement to that effect in the spec, and
> even if there was such a statement surely we'd find a firmware vendor
> who doesn't play by it.

I'd be relatively surprised if a vendor were to make changes to the
underlying TianoCore/EDK2 implementation in this respect. It doesn't
seem like one of the areas in which they would want to apply the "value
subtract" that they so desperately cling to.

Perhaps we should push to have the spec amended to mandate the current
behaviour?

> > 
> > What is the "other half" of which you speak? You mentioned marking the
> > section RWX — but for that to be a long-term solution to the issue,
> > we'd basically have to ensure that we always mark *all* sections which
> > might have relocations (even .rodata) as writeable, in case UEFI
> > decides to honour their read-only status at some point in the future.
>
> The other half is to preferably remove all (assembly) contributions from
> sections ending up R or RX. In particular, just like the compiler does,
> such .rodata pieces ought to be moved to .data.rel.ro; .init.text ones
> would need to move to .init.data or .init.data.rel.ro. And ideally we'd have
> link time verification that no relocations exist for R or RX sections ...

It wouldn't be that hard to add build-time verification in mkreloc.c —
it's already processing one section at a time, and can see the flags.
So it shouldn't be hard to make it bail out if a section without the W
flag contains any relocations.

But we might do better just to mark all the COFF sections as writeable,
even if it's done by post-processing the EFI executable. The
*important* part is marking them read-only in our own page tables after
we're running, and grouping them into sections to make that possible is
most useful. UEFI marking them read-only in the brief period while
we're starting up is just kind of pointless from our point of view.

Alternatively, if we really don't trust the UEFI implementation to let
us write to read-only sections after ExitBootServices, we could ensure
that we switch to our own page tables *before* doing the final pass of
relocations. Currently efi_arch_post_exit_boot() will do them just
*before* the switch. 

That's slightly less trivial than it sounds, as it would need to be
position-independent. But doesn't it basically already need to be, or
it would end up relocating itself while it's running?

(Hm, ick... the more I look at this, the more I wish my initial
response had been "la la la it was already broken it wasn't me..." :)


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread David Woodhouse
On Wed, 2017-08-02 at 05:56 -0600, Jan Beulich wrote:
> > 
> > > 
> > > > 
> > > > David Woodhouse <dw...@infradead.org> 08/02/17 1:30 PM >>>
> > --- a/xen/arch/x86/efi/efi-boot.h
> > +++ b/xen/arch/x86/efi/efi-boot.h
> > @@ -87,7 +87,8 @@ static void __init efi_arch_relocate_image(unsigned long 
> > delta)
> > case PE_BASE_RELOC_DIR64:
> > if ( in_page_tables(addr) )
> > blexit(L"Unexpected relocation type");
> > -*(u64 *)addr += delta;
> > +if ( delta )
> > +*(u64 *)addr += delta;
> > break;
> > default:
> > blexit(L"Unsupported relocation type");
>
> While of course this and the other half of the necessary changes are
> independent (i.e. this patch could be taken as is), I really had intended
> to deal with both sides of this issue at once, and hence I'm not entirely
> happy for this partial change to go in on its own. Nevertheless if need
> be it can have my ack.

I am, evidently, still being dense.

This change is sufficient (we believe) to make EFI builds of Xen
actually boot again on current EDK2, is it not?

What is the "other half" of which you speak? You mentioned marking the
section RWX — but for that to be a long-term solution to the issue,
we'd basically have to ensure that we always mark *all* sections which
might have relocations (even .rodata) as writeable, in case UEFI
decides to honour their read-only status at some point in the future.

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link

2017-08-02 Thread David Woodhouse
On Wed, 2017-08-02 at 03:17 -0600, Jan Beulich wrote:
> 
> >... but then again, if the whole function is really doing nothing at
> >all when invoked with delta==0 then perhaps it would just be easier to
> >remove the first pass altogether. I feel sure I'm missing something,
> 
> The reason is even visible in context above: We can't call blexit() after
> having called ExitBootServices(). Hence we need a "dry run" done early,
> to be certain we won't encounter unhandlable relocations later on.

Ah, thanks. I'd glossed over that as a "can never happen" condition.
And the 'default:' case really is that — the build system is broken if
we ever see that. But the DIR64 / in_page_tables() one is less provably
impossible.

> >Either way, this is still broken before my patch though, right?
> 
> As long as there are .rodata entries needing relocation (which I
> understand you've found example of), yes.

And more to the point, .init.text entries needing relocation (since
UEFI isn't marking read-only *data* sections read-only yet for some
reason; only R+X sections).

But still that basically means that new versions of UEFI are going to
stop booting all existing EFI Xen images, doesn't it? Perhaps we should
look for a mitigation on the UEFI side.

Jeff, Jiewen, has this actually been shipped in an EDK2 release yet?

I confess I haven't actually built a current OVMF and *tested* the
hypothesis that it breaks; it just seems "obvious" :)

Adding Mark. Background: we think
https://github.com/tianocore/edk2/commit/d0e92aad46 will break existing
Xen EFI binaries because they write to a read-only code section
(.init.te) before calling ExitBootServices().



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread David Woodhouse
The function is invoked with delta=0 before ExitBootServices() is called,
as a dummy run purely to validate that all the relocations can be handled.
This allows us to exit gracefully with an error message.

However, we have relocations in read-only sections such as .rodata and
.init.te(xt). Recent versions of UEFI will actually make those sections
read-only, which will cause a fault. This functionaity was added in
EDK2 commit d0e92aad4 ("MdeModulePkg/DxeCore: Add UEFI image protection.")

It's OK to actually make the changes in the later pass because UEFI will
tear down the protection when ExitBootServices() is called, because it
knows we're going to need to do this kind of thing.

Signed-off-by: David Woodhouse <d...@amazon.co.uk>
---
This basically means that new versions of UEFI are going to break
(all?) existing EFI Xen versions? Or at least any that have relocations
in .init.text.

 xen/arch/x86/efi/efi-boot.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index bedac5c..8d295ff 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -87,7 +87,8 @@ static void __init efi_arch_relocate_image(unsigned long 
delta)
 case PE_BASE_RELOC_DIR64:
 if ( in_page_tables(addr) )
 blexit(L"Unexpected relocation type");
-*(u64 *)addr += delta;
+if ( delta )
+*(u64 *)addr += delta;
 break;
 default:
 blexit(L"Unsupported relocation type");
-- 
2.7.4


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link

2017-07-31 Thread David Woodhouse
On Mon, 2017-07-31 at 07:15 -0600, Jan Beulich wrote:
> > > > David Woodhouse <dw...@infradead.org> 07/31/17 1:02 PM >>>
> > On Sun, 2017-07-30 at 00:16 -0600, Jan Beulich wrote:
> > > > > > David Woodhouse <dw...@infradead.org> 07/20/17 5:22 PM >>>
> > > > This includes stuff lke the hypercall tables which we really want
> > > > to be read-only. And they were going into .data.read-mostly.
> > >
> > > Yes, we'd like them to be read-only, but what if EFI properly assigned r/o
> > > permissions to the .rodata section when loading xen.efi? We'd then be
> > > unable to apply relocations when switching from 1:1 to virtual mappings
> > > (see efi_arch_relocate_image()).
> > 
> > FWIW it does look like TianoCore has gained the ability to mark
> > sections as read-only, in January of this year:
> > https://github.com/tianocore/edk2/commit/d0e92aad46
> > 
> > It doesn't actually seem to be complete — even with subsequent fixes
> > since that commit, it doesn't look like it catches the case of data
> > sections without EFI_IMAGE_SCN_MEM_WRITE, such as .rodata. 
> > 
> > And even if/when that gets fixed you'll note that the protection is
> > deliberately torn down in ExitBootServices(), specifically for the case
> > you're concerned about below — because you'll need to do the
> > relocations.
>
> As said in an earlier reply, a first pass over relocations is being done
> long before the call to ExitBootServices(). A minimal adjustment to
> efi_arch_relocate_image() will be needed anyway, afaict.

Ah, right. I think more "implied" than "said" but I understand now. :)

At least, I understand what you're saying... I have no bloody clue
what's actually going on though.

There is a first call to efi_arch_relocate_image(0) before the
ExitBootServices() call. However I'm missing something because I can't
see how that call achieves anything *other* than to trigger the fault
we're concerned about.

There are three types of relocations — PE_BASE_RELOC_ABS,
PE_BASE_RELOC_HIGHLOW, and PE_BASE_RELOC_DIR64.

The first (ABS) doesn't seem to do anything, ever. And is never emitted
by mkrelocs.c. 

The second (HIGHLOW) does nothing if (!delta).

The third (DIR64) simply adds 'delta' to the target address. We could
potentially stop it faulting on that pointless '*addr += 0' by doing
this...

--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -87,7 +87,8 @@ static void __init efi_arch_relocate_image(unsigned long 
delta)
 case PE_BASE_RELOC_DIR64:
 if ( in_page_tables(addr) )
 blexit(L"Unexpected relocation type");
-*(u64 *)addr += delta;
+if ( delta )
+*(u64 *)addr += delta;
 break;
 default:
 blexit(L"Unsupported relocation type");


... but then again, if the whole function is really doing nothing at
all when invoked with delta==0 then perhaps it would just be easier to
remove the first pass altogether. I feel sure I'm missing something,
but what? Is it still supposed to be adding xen_phys_start in the
PE_BASE_RELOC_HIGHLOW case even when delta==0? Because it isn't...

Either way, this is still broken before my patch though, right?
Especially if UEFI learns to do it for non-executable sections, but
AFAICT even before that.

These are the sections I see the PE section headers of a local build:

 Name     Characteristics   Relocations
.text    0xe0d00020 (WRX)    ✓
.rodata  0x40600040 ( R )    ✓
.buildid 0x40300040 ( R )
.init.te 0x60500020 ( RX)    ✓
.init.da 0xc0d00040 (WR )    ✓
.data.re 0xc0800040 (WR )    ✓
.data    0xc0d00040 (WR )    ✓
.bss     0xc180 (WR )
.reloc   0x42300040 ( R )
.pad     0xc0300080 (WR )

So there are (again, before my patch) relocations in .init.da(ta) and
.rodata sections which UEFI *might* start marking read-only, and also
in .init.te(xt) which is R+X and could be marked read-only today.

And the .init.te(xt) relocations include PE_BASE_RELOC_DIR64
relocations, which *would* cause a fault in the !delta case.

(All the relocations in .rodata both before and after my patch are also
PE_BASE_RELOC_DIR64, FWIW)




smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link

2017-07-31 Thread David Woodhouse
On Sun, 2017-07-30 at 00:16 -0600, Jan Beulich wrote:
> > 
> > > 
> > > > 
> > > > David Woodhouse <dw...@infradead.org> 07/20/17 5:22 PM >>>
> > This includes stuff lke the hypercall tables which we really want
> > to be read-only. And they were going into .data.read-mostly.
> Yes, we'd like them to be read-only, but what if EFI properly assigned r/o
> permissions to the .rodata section when loading xen.efi? We'd then be
> unable to apply relocations when switching from 1:1 to virtual mappings
> (see efi_arch_relocate_image()).


FWIW it does look like TianoCore has gained the ability to mark
sections as read-only, in January of this year:
https://github.com/tianocore/edk2/commit/d0e92aad46

It doesn't actually seem to be complete — even with subsequent fixes
since that commit, it doesn't look like it catches the case of data
sections without EFI_IMAGE_SCN_MEM_WRITE, such as .rodata. 

And even if/when that gets fixed you'll note that the protection is
deliberately torn down in ExitBootServices(), specifically for the case
you're concerned about below — because you'll need to do the
relocations.

So I don't think there should be a problem here.

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link

2017-07-31 Thread David Woodhouse
On Mon, 2017-07-31 at 03:57 -0600, Jan Beulich wrote:
> Are there any such relocations? The compiler shouldn't emit data needing
> relocation to .rodata, so if at all such might live in assembly code. But yes,
> if there are any, things would have been latently broken even before.


 $ git diff 33a0b4fe90f1ef1a104dd454c931bb46d417ffca^
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 93ead6e..aa25dd9 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -194,7 +194,7 @@ $(TARGET).efi: prelink-efi.o $(note_file) efi.lds 
efi/relocs-dummy.o $(BASEDIR)/
if $(guard) false; then rm -f $@; echo 'EFI support disabled'; \
else $(NM) -pa --format=sysv $(@D)/$(@F) \
| $(BASEDIR)/tools/symbols --xensyms --sysv --sort 
>$(@D)/$(@F).map; fi
-   rm -f $(@D)/.$(@F).[0-9]*
+   #rm -f $(@D)/.$(@F).[0-9]*
 
 efi/boot.init.o efi/runtime.o efi/compat.o efi/buildid.o: 
$(BASEDIR)/arch/x86/efi/built_in.o
 efi/boot.init.o efi/runtime.o efi/compat.o efi/buildid.o: ;
diff --git a/xen/arch/x86/efi/mkreloc.c b/xen/arch/x86/efi/mkreloc.c
index bddcce0..55d14a7 100644
--- a/xen/arch/x86/efi/mkreloc.c
+++ b/xen/arch/x86/efi/mkreloc.c
@@ -346,6 +346,7 @@ int main(int argc, char *argv[])
  memcmp(sec1[i].name, ".lockpro", sizeof(sec1[i].name)) == 0 )
 continue;
 
+   printf("# section %.*s\n", (int)sizeof(sec1[i].name), sec1[i].name);
 if ( !sec1[i].rva )
 {
 fprintf(stderr, "Can't handle section %u with zero RVA\n", i);
 $ grep -A36 rodata .xen.efi.0r.S
# section .rodata
.equ rva_0020_relocs, 0x0c
.balign 4
.long 0x41b000, rva_0041b000_relocs
.word (10 << 12) | 0x920
.word (10 << 12) | 0x928
.word (10 << 12) | 0x930
.word (10 << 12) | 0x938
.word (10 << 12) | 0x940
.word (10 << 12) | 0x948
.word (10 << 12) | 0x950
.word (10 << 12) | 0x958
.word (10 << 12) | 0x960
.word (10 << 12) | 0x968
.word (10 << 12) | 0x970
.word (10 << 12) | 0x978
.word (10 << 12) | 0x980
.word (10 << 12) | 0x988
.word (10 << 12) | 0x990
.word (10 << 12) | 0x998
.word (10 << 12) | 0x9a0
.word (10 << 12) | 0x9a8
.word (10 << 12) | 0x9b0
.word (10 << 12) | 0x9b8
.word (10 << 12) | 0x9c0
.word (10 << 12) | 0x9c8
.word (10 << 12) | 0x9d0
.word (10 << 12) | 0x9d8
.word (10 << 12) | 0x9e0
.word (10 << 12) | 0x9e8
.word (10 << 12) | 0x9f0
.word (10 << 12) | 0x9f8
.word (10 << 12) | 0xa00
.word (10 << 12) | 0xa08
.word (10 << 12) | 0xa10
.word (10 << 12) | 0xa18
# section .init.te


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link

2017-07-30 Thread David Woodhouse
On Sun, 2017-07-30 at 13:50 +0100, Andrew Cooper wrote:
> On 30/07/17 07:16, Jan Beulich wrote:
> > 
> > > 
> > > > 
> > > > > 
> > > > > David Woodhouse <dw...@infradead.org> 07/20/17 5:22 PM >>>
> > > This includes stuff lke the hypercall tables which we really want
> > > to be read-only. And they were going into .data.read-mostly.
> > Yes, we'd like them to be read-only, but what if EFI properly assigned r/o
> > permissions to the .rodata section when loading xen.efi? We'd then be
> > unable to apply relocations when switching from 1:1 to virtual mappings
> > (see efi_arch_relocate_image()).
> Ah yes.  I'd overlooked that point when considering the ramifications of
> this change.
> 
> efi_arch_relocate_image() should probably do the same as what we do with
> livepatching, and temporarily clear CR0.WP for the duration of the patching.

Hm, efi/mkreloc.c was already emitting relocations in the .rodata
section before this change. Are you saying that was already broken?

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] xen/link: Move .data.rel.ro sections into .rodata for final link

2017-07-25 Thread David Woodhouse
From: David Woodhouse <d...@amazon.co.uk>

This includes stuff like the hypercall tables which we really kind of want
to be read-only. And they were going into .data.read-mostly.

Signed-off-by: David Woodhouse <d...@amazon.co.uk>
Reviewed-by: Wei Liu <wei.l...@citrix.com>
Acked-by: Julien Grall <julien.gr...@arm.com>
Acked-by: Andrew Cooper <andrew.coop...@citrix.com>
---
Fixed typo, collected acks, and this time sent from a properly
maintained distribution instead of Ubuntu, so spaces shouldn't get
turned into  by the mailer.

 xen/arch/arm/xen.lds.S | 4 ++--
 xen/arch/x86/xen.lds.S | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 44bd3bf..2d54f22 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -52,6 +52,8 @@ SECTIONS
__stop_bug_frames_2 = .;
*(.rodata)
*(.rodata.*)
+   *(.data.rel.ro)
+   *(.data.rel.ro.*)
 
 #ifdef CONFIG_LOCK_PROFILE
. = ALIGN(POINTER_ALIGN);
@@ -97,8 +99,6 @@ SECTIONS
__stop___pre_ex_table = .;
 
*(.data.read_mostly)
-   *(.data.rel.ro)
-   *(.data.rel.ro.*)
   } :text
 
   . = ALIGN(8);
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 8289a1b..ff08bbe 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -90,6 +90,8 @@ SECTIONS
 
*(.rodata)
*(.rodata.*)
+   *(.data.rel.ro)
+   *(.data.rel.ro.*)
 
 #if defined(BUILD_ID) && defined(EFI) && !defined(BUILD_ID_EFI)
 /*
@@ -224,8 +226,6 @@ SECTIONS
__start_schedulers_array = .;
*(.data.schedulers)
__end_schedulers_array = .;
-   *(.data.rel.ro)
-   *(.data.rel.ro.*)
   } :text
 
   .data : {/* Data */
-- 
2.7.4

-- 
dwmw2

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link

2017-07-20 Thread David Woodhouse
From: David Woodhouse <d...@amazon.co.uk>

This includes stuff lke the hypercall tables which we really want
to be read-only. And they were going into .data.read-mostly.

Signed-off-by: David Woodhouse <d...@amazon.co.uk>
---
Build tested on x86_64 (you really don't want to know about what I
*actually* tested it with), not at all tested on ARM.

 xen/arch/arm/xen.lds.S | 4 ++--
 xen/arch/x86/xen.lds.S | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 44bd3bf..2d54f22 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -52,6 +52,8 @@ SECTIONS
__stop_bug_frames_2 = .;
*(.rodata)
*(.rodata.*)
+   *(.data.rel.ro)
+   *(.data.rel.ro.*)
 
 #ifdef CONFIG_LOCK_PROFILE
. = ALIGN(POINTER_ALIGN);
@@ -97,8 +99,6 @@ SECTIONS
__stop___pre_ex_table = .;
 
*(.data.read_mostly)
-   *(.data.rel.ro)
-   *(.data.rel.ro.*)
   } :text
 
   . = ALIGN(8);
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 8289a1b..ff08bbe 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -90,6 +90,8 @@ SECTIONS
 
*(.rodata)
*(.rodata.*)
+   *(.data.rel.ro)
+   *(.data.rel.ro.*)
 
 #if defined(BUILD_ID) && defined(EFI) && !defined(BUILD_ID_EFI)
 /*
@@ -224,8 +226,6 @@ SECTIONS
__start_schedulers_array = .;
*(.data.schedulers)
__end_schedulers_array = .;
-   *(.data.rel.ro)
-   *(.data.rel.ro.*)
   } :text
 
   .data : {/* Data */
-- 
2.7.4


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen: do not re-use pirq number cached in pci device msi msg data

2017-05-03 Thread David Woodhouse
On Wed, 2017-02-22 at 10:14 -0500, Boris Ostrovsky wrote:
> On 02/22/2017 09:28 AM, Bjorn Helgaas wrote:
> > 
> > On Tue, Feb 21, 2017 at 10:58:39AM -0500, Boris Ostrovsky wrote:
> > > 
> > > On 02/21/2017 10:45 AM, Juergen Gross wrote:
> > > > 
> > > > On 21/02/17 16:31, Dan Streetman wrote:
> > > > > 
> > > > > On Fri, Jan 13, 2017 at 5:30 PM, Konrad Rzeszutek Wilk
> > > > >  wrote:
> > > > > > 
> > > > > > On Fri, Jan 13, 2017 at 03:07:51PM -0500, Dan Streetman wrote:
> > > > > > > 
> > > > > > > Revert the main part of commit:
> > > > > > > af42b8d12f8a ("xen: fix MSI setup and teardown for PV on HVM 
> > > > > > > guests")
> > > > > > > 
> > > > > > > That commit introduced reading the pci device's msi message data 
> > > > > > > to see
> > > > > > > if a pirq was previously configured for the device's msi/msix, 
> > > > > > > and re-use
> > > > > > > that pirq.  At the time, that was the correct behavior.  However, 
> > > > > > > a
> > > > > > > later change to Qemu caused it to call into the Xen hypervisor to 
> > > > > > > unmap
> > > > > > > all pirqs for a pci device, when the pci device disables its 
> > > > > > > MSI/MSIX
> > > > > > > vectors; specifically the Qemu commit:
> > > > > > > c976437c7dba9c7444fb41df45468968aaa326ad
> > > > > > > ("qemu-xen: free all the pirqs for msi/msix when driver unload")
> > > > > > > 
> > > > > > > Once Qemu added this pirq unmapping, it was no longer correct for 
> > > > > > > the
> > > > > > > kernel to re-use the pirq number cached in the pci device msi 
> > > > > > > message
> > > > > > > data.  All Qemu releases since 2.1.0 contain the patch that 
> > > > > > > unmaps the
> > > > > > > pirqs when the pci device disables its MSI/MSIX vectors.
> > > > > > > 
> > > > > > > This bug is causing failures to initialize multiple NVMe 
> > > > > > > controllers
> > > > > > > under Xen, because the NVMe driver sets up a single MSIX vector 
> > > > > > > for
> > > > > > > each controller (concurrently), and then after using that to talk 
> > > > > > > to
> > > > > > > the controller for some configuration data, it disables the 
> > > > > > > single MSIX
> > > > > > > vector and re-configures all the MSIX vectors it needs.  So the 
> > > > > > > MSIX
> > > > > > > setup code tries to re-use the cached pirq from the first vector
> > > > > > > for each controller, but the hypervisor has already given away 
> > > > > > > that
> > > > > > > pirq to another controller, and its initialization fails.
> > > > > > > 
> > > > > > > This is discussed in more detail at:
> > > > > > > https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
> > > > > > > 
> > > > > > > Fixes: af42b8d12f8a ("xen: fix MSI setup and teardown for PV on 
> > > > > > > HVM guests")
> > > > > > > Signed-off-by: Dan Streetman 
> > > > > > Acked-by: Konrad Rzeszutek Wilk 
> > > > > This doesn't seem to be applied yet, is it still waiting on another
> > > > > ack?  Or maybe I'm looking at the wrong git tree...
> > > > Am I wrong or shouldn't this go through the PCI tree? Konrad?
> > > Konrad is away this week but since pull request for Xen tree just went
> > > out we should probably wait until rc1 anyway (unless something big comes
> > > up before that).
> > I assume this should go via the Xen or x86 tree, since that's how most
> > arch/x86/pci/xen.c patches have been handled, including af42b8d12f8a.
> > If you think otherwise, let me know.
>
> OK, I applied it to Xen tree's for-linus-4.11.

Hm, we want this (c74fd80f2f4) in stable too, don't we?


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/mm: Alter is_iomem_page() to use mfn_t

2017-02-09 Thread David Woodhouse
On Thu, 2017-02-09 at 11:54 +0800, Chao Gao wrote:
> 
> Jan is right. When the assertion fails, the value of gfn is 0xfee00.
> 
> Do you mean that is_iomem_page() is equal to direct_mmio except some
> corner cases such as APIC access MFN and INVALID_MFN( any others? ) ?

That was the hypothesis, and it seems reasonable.

But I also note you're only testing that hypothesis in the
direct_mmio==1 case. You don't test the hypothesis that if
direct_mmio==0, then so is is_iomem_page().

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/mm: Alter is_iomem_page() to use mfn_t

2017-02-09 Thread David Woodhouse
On Thu, 2017-02-09 at 01:56 -0700, Jan Beulich wrote:
> 
> > diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
> > index 86c71be..3d9386a 100644
> > --- a/xen/arch/x86/hvm/mtrr.c
> > +++ b/xen/arch/x86/hvm/mtrr.c
> > @@ -784,6 +784,8 @@ int epte_get_entry_emt(struct domain *d, unsigned long 
> > gfn, 
> >  
> >  if ( direct_mmio )
> >  {
> > +    ASSERT(direct_mmio == is_iomem_page(mfn)); 
> > +
> >  if ( (mfn_x(mfn) ^ d->arch.hvm_domain.vmx.apic_access_mfn) >> 
> >order )
> >  return MTRR_TYPE_UNCACHABLE;
> >  if ( order )
> > 
> > But when I try to create a hvm domain, it failed. logs are below.
> 
> Considering the context of the change above, I'm not surprised:
> You've very likely hit the APIC access MFN.

You'll probably hit it for INVALID_MFN too on an unmap. And *do* make
it print the offending address when it triggers. :)

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again

2017-02-08 Thread David Woodhouse
On Thu, 2017-01-26 at 14:50 +, David Woodhouse wrote:
> 
> Finally, add a comment clarifying how that 'return -1' works — it isn't
> returning an error and causing the mapping to fail; it relies on…

Btw, something odd happened when committing this. That U+2014 emdash in
the first (cited) line above got turned into a U+0097 'END OF GUARDED
AREA' character. I've no idea why — just using 'git am' should have
worked perfectly normally.

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/mm: Alter is_iomem_page() to use mfn_t

2017-02-06 Thread David Woodhouse
On Mon, 2017-02-06 at 07:26 -0700, Jan Beulich wrote:
> > 
> > > 
> > > > 
> > > > On 06.02.17 at 14:55,  wrote:
> > Switch its return type to bool to match its use, and simplify the
> > ARM
> > implementation slightly.
> > 
> > No functional change.
> > 
> > Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 
> 
> And perhaps that's what should be used in epte_get_entry_emt()
> instead of !mfn_valid() in David's patch. David, would you be okay
> with your patch changed to that effect upon commit?

I don't think that works, at least not literally
s/!mfn_valid()/is_iomem_page()/

In my patch, mfn_valid() is checked *after* we've processed the
'direct_mmio' case that all MMIO should hit. In a sane world I think
it's *only* actually catching INVALID_MFN, and probably should never
match on any other value of mfn.

I don't quite understand why we pass 'direct_mmio' in as a separate
argument. Perhaps there's scope for doing a sanity check that
'direct_mmio == is_iomem_page(mfn)' — because when would that *not* be
true?

But sanity checks are of dubious utility because it must be noted that
we have no way to return failure for order==0 anyway; that 'return -1'
idiom is dangerous in epte_get_entry_emt(), and that's why I gave it an
extra comment.

So if we can use is_iomem_page() do we ditch the direct_mmio argument
to the function entirely?

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again

2017-02-06 Thread David Woodhouse
On Fri, 2017-01-27 at 10:36 -0500, Konrad Rzeszutek Wilk wrote:
> . snip ..
> > 
> > > 
> > > Signed-off-by: David Woodhouse <d...@amazon.com>
> > Reviewed-by: Jan Beulich <jbeul...@suse.com>
> > 
> > But before committing I'd prefer to have a VMX maintainer ack
> > too.
> Who is gone until Feb yth (Spring festival in China).

Not sure what number that 'y' was supposed to be, but I see Kevin
posting today... :)

Would also be very good to have a definitive answer about the safety of
mixing WC and UC on MMIO mappings on Intel hardware.

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings

2017-02-01 Thread David Woodhouse
On Wed, 2017-01-25 at 07:21 -0700, Jan Beulich wrote:
> 
> Well, in the context of this XSA we've asked both of them, and iirc
> we've got a vague reply from Intel and none from AMD. In fact we
> did defer the XSA for quite a bit waiting for any useful feedback.
> To AMD's advantage I'd like to add though that iirc they're a little
> more clear in their PM about the specific question of UC and WC
> you raise: They group the various cacheabilities into two groups
> (cacheable and uncacheable) and require there to only not be
> any mixture between groups. Iirc Intel's somewhat vague reply
> allowed us to conclude we're likely safe that way on their side too.

It would be good to get a definitive answer from Intel, to match AMD's.
That's basically why I added hpa to CC, in fact.

Peter, is there any possibility of a clarification here, please?

Thanks.


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again

2017-01-26 Thread David Woodhouse
From: David Woodhouse <d...@amazon.com>

For some MMIO regions, such as those high above RAM, mfn_valid() will
return false.

Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
consistent cachability of MMIO mappings"), guests have no longer been
able to use PAT to obtain write-combining on such regions because the
'ignore PAT' bit is set in EPT.

We probably want to err on the side of caution and preserve that
behaviour for addresses in mmio_ro_ranges, but not for normal MMIO
mappings. That necessitates a slight refactoring to check mfn_valid()
later, and let the MMIO case get through to the right code path.

Since we're not bailing out for !mfn_valid() immediately, the range
checks need to be adjusted to cope — simply by masking in the low bits
to account for 'order' instead of adding, to avoid overflow when the mfn
is INVALID_MFN (which happens on unmap, since we carefully call this
function to fill in the EMT even though the PTE won't be valid).

The range checks are also slightly refactored to put only one of them in
the fast path in the common case. If it doesn't overlap, then it
*definitely* isn't contained, so we don't need both checks. And if it
overlaps and is only one page, then it definitely *is* contained.

Finally, add a comment clarifying how that 'return -1' works — it isn't
returning an error and causing the mapping to fail; it relies on
resolve_misconfig() being able to split the mapping later. So it's
*only* sane to do it where order>0 and the 'problem' will be solved by
splitting the large page. Not for blindly returning 'error', which I was
tempted to do in my first attempt.

Signed-off-by: David Woodhouse <d...@amazon.com>
---
 xen/arch/x86/hvm/mtrr.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 709759c..8fef756 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -773,17 +773,19 @@ int epte_get_entry_emt(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 if ( v->domain != d )
 v = d->vcpu ? d->vcpu[0] : NULL;
 
-if ( !mfn_valid(mfn_x(mfn)) ||
- rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
-{
-*ipat = 1;
-return MTRR_TYPE_UNCACHABLE;
-}
-
+/* Mask, not add, for order so it works with INVALID_MFN on unmapping */
 if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
+ mfn_x(mfn) | ((1UL << order) - 1)) )
+{
+if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+   mfn_x(mfn) | ((1UL << order) - 
1)) )
+{
+*ipat = 1;
+return MTRR_TYPE_UNCACHABLE;
+}
+/* Force invalid memory type so resolve_misconfig() will split it */
 return -1;
+}
 
 if ( direct_mmio )
 {
@@ -795,6 +797,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return MTRR_TYPE_WRBACK;
 }
 
+if ( !mfn_valid(mfn_x(mfn)) )
+{
+*ipat = 1;
+return MTRR_TYPE_UNCACHABLE;
+}
+
 if ( !need_iommu(d) && !cache_flush_permitted(d) )
 {
 *ipat = 1;
-- 
2.7.4


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again

2017-01-26 Thread David Woodhouse
On Thu, 2017-01-26 at 07:35 -0700, Jan Beulich wrote:
> 
> Hmm, didn't you say you'd take care of the hard tabs?

Er... yes. But it seems I neglected to specify whether I would also
*save* the resulting file, 'git commit --amend', and actually send that
version. Sorry.

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again

2017-01-26 Thread David Woodhouse
From: David Woodhouse <d...@amazon.com>

For some MMIO regions, such as those high above RAM, mfn_valid() will
return false.

Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
consistent cachability of MMIO mappings"), guests have no longer been
able to use PAT to obtain write-combining on such regions because the
'ignore PAT' bit is set in EPT.

We probably want to err on the side of caution and preserve that
behaviour for addresses in mmio_ro_ranges, but not for normal MMIO
mappings. That necessitates a slight refactoring to check mfn_valid()
later, and let the MMIO case get through to the right code path.

Since we're not bailing out for !mfn_valid() immediately, the range
checks need to be adjusted to cope — simply by masking in the low bits
to account for 'order' instead of adding, to avoid overflow when the mfn
is INVALID_MFN (which happens on unmap, since we carefully call this
function to fill in the EMT even though the PTE won't be valid).

The range checks are also slightly refactored to put only one of them in
the fast path in the common case. If it doesn't overlap, then it
*definitely* isn't contained, so we don't need both checks. And if it
overlaps and is only one page, then it definitely *is* contained.

Finally, add a comment clarifying how that 'return -1' works — it isn't
returning an error and causing the mapping to fail; it relies on
resolve_misconfig() being able to split the mapping later. So it's
*only* sane to do it where order>0 and the 'problem' will be solved by
splitting the large page. Not for blindly returning 'error', which I was
tempted to do in my first attempt.

Signed-off-by: David Woodhouse <d...@amazon.com>
---
 xen/arch/x86/hvm/mtrr.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 709759c..41ae8b4 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -773,18 +773,20 @@ int epte_get_entry_emt(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 if ( v->domain != d )
 v = d->vcpu ? d->vcpu[0] : NULL;
 
-if ( !mfn_valid(mfn_x(mfn)) ||
- rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
+/* Mask, not add, for order so it works with INVALID_MFN on unmapping */
+if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+    mfn_x(mfn) | ((1UL << order) - 1)) )
 {
-*ipat = 1;
-return MTRR_TYPE_UNCACHABLE;
+   if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+      mfn_x(mfn) | ((1UL << order) - 
1)) )
+   {
+   *ipat = 1;
+   return MTRR_TYPE_UNCACHABLE;
+   }
+   /* Force invalid memory type so resolve_misconfig() will split it */
+   return -1;
 }
 
-if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
-return -1;
-
 if ( direct_mmio )
 {
 if ( (mfn_x(mfn) ^ d->arch.hvm_domain.vmx.apic_access_mfn) >> order )
@@ -795,6 +797,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return MTRR_TYPE_WRBACK;
 }
 
+if ( !mfn_valid(mfn_x(mfn)) )
+{
+   *ipat = 1;
+   return MTRR_TYPE_UNCACHABLE;
+}
+
 if ( !need_iommu(d) && !cache_flush_permitted(d) )
 {
 *ipat = 1;
-- 
2.7.4


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: Allow write-combining on MMIO mappings again

2017-01-26 Thread David Woodhouse
On Thu, 2017-01-26 at 03:45 -0700, Jan Beulich wrote:
> 
> This is too broad a statement: MMIO pages for which mfn_valid()
> is true would still allow WC use.

... where mfn_valid() would be true for MMIO regions in the low memory
hole, but not for regions above all physical memory?

> The only other issue with the patch that I can see is that you need
> to eliminate all the hard tabs you're introducing.

Bad emacs; no biscuit. I shall teach it that we're not in Kansas any
more, and repost. Thanks.



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86: Allow write-combining on MMIO mappings again

2017-01-26 Thread David Woodhouse
From: David Woodhouse <d...@amazon.com>

Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
consistent cachability of MMIO mappings"), HVM guests have no longer
been able to use PAT to obtain write-combining on MMIO because the
'ignore PAT' bit is set in EPT.

For MMIO we shouldn't be setting the 'ignore PAT' bit, although we
probably want to err on the side of caution and still do so for
addresses in mmio_ro_ranges. That necessitates a slight refactoring
to let the MMIO (for which mfn_vaiid() can be false) get through
to the right code path.

Since we're not bailing out for !mfn_valid() immediately, the range
checks need to be adjusted to cope — simply by masking in the low bits
to account for 'order' instead of adding, to avoid overflow when the
mfn is INVALID_MFN (which happens on unmap, since we carefully call
this function to fill in the EMT even though the PTE won't be valid).

The range checks are also slightly refactored to put only one of them
in the fast path in the common case. If it doesn't overlap, then it
*definitely* isn't contained, so we don't need both checks. And if it
overlaps and is only one page, then it definitely *is* contained.

Finally, add a comment clarifying how that 'return -1' works — it isn't
returning an error and causing the mapping to fail; it relies on
resolve_misconfig() being able to split the mapping later. So it's
*only* sane to do it where order>0 and the 'problem' will be solved
by splitting the large page. Not for blindly returning 'error', which
I was tempted to do in my first attempt.

Signed-off-by: David Woodhouse <d...@amazon.com>
---
 xen/arch/x86/hvm/mtrr.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 709759c..41ae8b4 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -773,18 +773,20 @@ int epte_get_entry_emt(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 if ( v->domain != d )
 v = d->vcpu ? d->vcpu[0] : NULL;
 
-if ( !mfn_valid(mfn_x(mfn)) ||
- rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
+/* Mask, not add, for order so it works with INVALID_MFN on unmapping */
+if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+    mfn_x(mfn) | ((1UL << order) - 1)) )
 {
-*ipat = 1;
-return MTRR_TYPE_UNCACHABLE;
+   if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+      mfn_x(mfn) | ((1UL << order) - 
1)) )
+   {
+   *ipat = 1;
+   return MTRR_TYPE_UNCACHABLE;
+   }
+   /* Force invalid memory type so resolve_misconfig() will split it */
+   return -1;
 }
 
-if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
-return -1;
-
 if ( direct_mmio )
 {
 if ( (mfn_x(mfn) ^ d->arch.hvm_domain.vmx.apic_access_mfn) >> order )
@@ -795,6 +797,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return MTRR_TYPE_WRBACK;
 }
 
+if ( !mfn_valid(mfn_x(mfn)) )
+{
+   *ipat = 1;
+   return MTRR_TYPE_UNCACHABLE;
+}
+
 if ( !need_iommu(d) && !cache_flush_permitted(d) )
 {
 *ipat = 1;
-- 
2.7.4


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings

2017-01-25 Thread David Woodhouse
On Wed, 2017-01-25 at 07:21 -0700, Jan Beulich wrote:
> 
> If there wasn't INVALID_MFN to be taken care of, the !mfn_valid()
> check could simply move down, and be combined with the
> direct_mmio one.

As discussed on IRC, once we fix the overflow with order > 0, I think
INVALID_MFN is OK. Not that it should ever really happen, AFAICT. 

This seems to do the right thing for my MMIO WC test. I haven't
actually combined the !mfn_valid() check with the direct_mmio one.
Under what circumstances does that make sense anyway? For now in the
patch below I've left it *forcing* UC, unlike the direct_mmio path
which lets the guest use WC. But really, shouldn't the '!direct_mmio &&
!mfn_valid()' case just return an error?

Note that as well as using a mask for 'order' I've attempted to
consolidate the unlikely rangeset_overlaps_range() and
rangeset_contains_range() calls, on the assumption that the former will
*always* be true if the latter is, so we only need one of them in the
fast path through the function.

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 709759c..09c2f5c 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -773,20 +773,24 @@ int epte_get_entry_emt(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 if ( v->domain != d )
 v = d->vcpu ? d->vcpu[0] : NULL;
 
-if ( !mfn_valid(mfn_x(mfn)) ||
- rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
+/* INVALID_MFN should never happen here, but if it does then this
+ * should do the right thing instead of exploding */
+if ( unlikely(rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+     mfn_x(mfn) | ((1UL << order) - 1))) )
 {
-*ipat = 1;
-return MTRR_TYPE_UNCACHABLE;
+   if ( rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+    mfn_x(mfn) | ((1UL << order) - 1)) )
+   {
+   *ipat = 1;
+   return MTRR_TYPE_UNCACHABLE;
+   }
+   /* Overlaps mmio_ro_ranges but is not entirely contained. No. */
+   return -1;
 }
 
-if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
-return -1;
-
 if ( direct_mmio )
 {
+   /* Again, INVALID_MFN should do the right thing here. */
 if ( (mfn_x(mfn) ^ d->arch.hvm_domain.vmx.apic_access_mfn) >> order )
 return MTRR_TYPE_UNCACHABLE;
 if ( order )
@@ -795,6 +799,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return MTRR_TYPE_WRBACK;
 }
 
+if ( unlikely(!mfn_valid(mfn_x(mfn))) )
+{
+   *ipat = 1;
+   return MTRR_TYPE_UNCACHABLE;
+}
+
 if ( !need_iommu(d) && !cache_flush_permitted(d) )
 {
 *ipat = 1;

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings

2017-01-25 Thread David Woodhouse
On Wed, 2017-01-25 at 07:21 -0700, Jan Beulich wrote:
> Note the difference between "contains" and "overlaps".

Oh, that makes more sense; thanks.

> > And then there's a 'if (direct_mmio) return UC;' further down which
> > looks like it'd do the right thing for the use case I'm actually
> > testing. I may see if I can construct a straw man patch, but I'm kind
> > of unfamiliar with this code so it should be taken with a large pinch
> > of salt...
>
> If there wasn't INVALID_MFN to be taken care of, the !mfn_valid()
> check could simply move down, and be combined with the
> direct_mmio one.

I'll see what I can come up with.

> > WC vs. UC. But it would be good to have a definitive answer from Intel
> > and AMD about whether it's safe.
>
> Well, in the context of this XSA we've asked both of them, and iirc
> we've got a vague reply from Intel and none from AMD. In fact we
> did defer the XSA for quite a bit waiting for any useful feedback.
> To AMD's advantage I'd like to add though that iirc they're a little
> more clear in their PM about the specific question of UC and WC
> you raise: They group the various cacheabilities into two groups
> (cacheable and uncacheable) and require there to only not be
> any mixture between groups. Iirc Intel's somewhat vague reply
> allowed us to conclude we're likely safe that way on their side too.

That's useful information; thanks. Coupled with the empirical testing
and the fact that we're already "violating" the recommendation of the
Intel SDM by allowing a mixture of memory types on RAM pages, I think
that's probably sufficient to conclude that it's OK to permit this.

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings

2017-01-25 Thread David Woodhouse
> x86: enforce consistent cachability of MMIO mappings
> 
> We've been told by Intel that inconsistent cachability between
> multiple mappings of the same page can affect system stability only
> when the affected page is an MMIO one. Since the stale data issue is
> of no relevance to the hypervisor (since all guest memory accesses go
> through proper accessors and validation), handling of RAM pages
> remains unchanged here. Any MMIO mapped by domains however needs to be
> done consistently (all cachable mappings or all uncachable ones), in
> order to avoid Machine Check exceptions. Since converting existing
> cachable mappings to uncachable (at the time an uncachable mapping
> gets established) would in the PV case require tracking all mappings,
> allow MMIO to only get mapped uncachable (UC, UC-, or WC).
> 
> This also implies that in the PV case we mustn't use the L1 PTE update
> fast path when cachability flags get altered.
> 
> Since in the HVM case at least for now we want to continue honoring
> pinned cachability attributes for pages not mapped by the hypervisor,
> special case handling of r/o MMIO pages (forcing UC) gets added there.
> Arguably the counterpart change to p2m-pt.c may not be necessary, since
> UC- (which already gets enforced there) is probably strict enough.
> 
> Note that the shadow code changes include fixing the write protection
> of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
> than l1e_remove_flags() and alike, return the new PTE (and hence
> ignoring their return values makes them no-ops).
> 
> This is CVE-2016-2270 / XSA-154.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 

...

> --- a/xen/arch/x86/hvm/mtrr.c
> +++ b/xen/arch/x86/hvm/mtrr.c
> @@ -770,8 +770,17 @@ int epte_get_entry_emt(struct domain *d,
>  if ( v->domain != d )
>  v = d->vcpu ? d->vcpu[0] : NULL;
>  
> -if ( !mfn_valid(mfn_x(mfn)) )
> +if ( !mfn_valid(mfn_x(mfn)) ||
> + rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
> + mfn_x(mfn) + (1UL < order) - 1) )
> +{
> +*ipat = 1;
>  return MTRR_TYPE_UNCACHABLE;
> +}
> +
> +if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> + mfn_x(mfn) + (1UL < order) - 1) )
> +return -1;
>  
>  switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, ) )
>  {

This doesn't look right. That second 'if(rangeset_overlaps_range(…))'
is tautologically false, because if it *is* true, the first if()
statement happens first and it's never reached.

The reason I'm looking is because that first if() statement is
happening for MMIO regions where it probably shouldn't. This means that
guests are mapping MMIO BARs of assigned devices and getting *forced*
UC (because *ipat=1) instead of taking the if(direct_mmio) path
slightly further down — which wouldn't set the 'ignore PAT' bit, and
would thus allow them to enable WC through their PAT.

It makes me wonder if the first was actually intended to be
'!mfn_valid() && rangeset_contains_range(…)' — with logical && rather
than ||. That would make some sense because it's then explicitly
refusing to map pages which are in mmio_ro_ranges *and* mfn_valid().

And then there's a 'if (direct_mmio) return UC;' further down which
looks like it'd do the right thing for the use case I'm actually
testing. I may see if I can construct a straw man patch, but I'm kind
of unfamiliar with this code so it should be taken with a large pinch
of salt...

There is a separate question of whether it's actually safe to let the
guest map an MMIO page with both UC and WC simultaneously. Empirically,
it seems to be OK — I hacked a guest kernel not to enforce the mutual
exclusion, mapped the BAR with both UC and WC and ran two kernel
threads, reading and writing the whole BAR in a number of iterations.
The WC thread went a lot faster than the UC one, so it will have often
been touching the same locations as the UC thread as it 'overtook' it,
and nothing bad happened. This seems reasonable, as the dire warnings
and machine checks are more about *cached* vs. uncached mappings, not
WC vs. UC. But it would be good to have a definitive answer from Intel
and AMD about whether it's safe.

-- 
dwmw2

¹ Or is that the problem — is mfn_valid() supposed to be true for MMIO
BARs, and this is actually an SR-IOV problem with resource assignment
failing to do that for VFs?


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 3/7] firmware: port built-in section to linker table

2016-02-29 Thread David Woodhouse
On Fri, 2016-02-19 at 05:45 -0800, Luis R. Rodriguez wrote:
> This ports built-in firmware to use linker tables,
> this replaces the custom section solution with a
> generic solution.
> 
> This also demos the use of the .rodata (SECTION_RO)
> linker tables.
> 
> Tested with 0 built-in firmware, 1 and 2 built-in
> firmwares successfully.

I think we'd do better to rip this support out entirely. It just isn't
needed; firmware can live in an initramfs and don't even need *any*
actual running userspace support to load it from there these days, do
we?

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 0/9] virtio DMA API, yet again

2016-02-17 Thread David Woodhouse
On Wed, 2016-02-17 at 11:29 +0200, Michael S. Tsirkin wrote:
> 
> I was hoping for an explicit ack from David Woodhouse,
> but I guess the informal "let's not hold this up"
> that was sent on v5 will do.

Apologies; I was working under that assumption too.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread David Woodhouse
On Tue, 2016-02-02 at 07:54 -0700, Alex Williamson wrote:
> 
> I first glance I like it, but there's a problem, it assumes there is a
> host driver for the device that will permanently release the device from
> the RMRR even after the device is unbound.  Currently we don't have a
> requirement that the user must first bind the device to a native host
> driver, unbind it, and only then is it eligible for device assignment.

It doesn't *have* to be a full native driver. It can be a PCI quirk
(the USB controllers could potentially do it that way, although they
don't). Or a stub 'shut it down' driver, potentially even done somehow
through VFIO.

But fundamentally, in all of these cases you have to do *something* to
stop the BIOS-controlled DMA. Otherwise the RMRR shouldn't have been
there in the first place, surely?

But for the gfx case... what *do* we have to do? Does the VMM (and the
VM's BIOS, between them) have to provision a "stolen" region of guest
memory and point the gfx framebuffer at that?

Once we have a proper handle on precisely what needs to happen, we can
have a better conversation about where/how to do that...

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-02 Thread David Woodhouse
On Tue, 2016-02-02 at 06:42 +, Tian, Kevin wrote:
> > From: Kay, Allen M
> > Sent: Tuesday, February 02, 2016 8:04 AM
> > > 
> > > David notes in the latter commit above:
> > > 
> > > "We should be able to successfully assign graphics devices to guests too, 
> > > as
> > > long as the initial handling of stolen memory is reconfigured 
> > > appropriately."
> > > 
> > > What code is supposed to be doing that reconfiguration when a device is
> > > assigned?  Clearly we don't have it yet, making assignment of these 
> > > devices
> > > very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> > > specific code to clear these settings to make it safe for userspace, then
> > > perhaps VM BIOS support to reallocate.  Is there any consistency across 
> > > IGD
> > > revisions for doing this?  Is there a spec?
> > > Thanks,

I haven't ever successfully assigned an IGD device to a VM myself, but
my understanding was that it *has* been done. So when the code was
changed to prevent assignment of devices afflicted by RMRRs (except USB
where we know it's OK), I just added the integrated graphics to that
same exception as USB, to preserve the status quo ante.

> I don't think stolen memory should be handled explicitly. If yes, it should be
> listed as a RMRR region so general RMRR setup will cover it. But as Allen
> pointed out, the whole RMRR becomes unnecessary if we target only secondary
> device for IGD.

Perhaps the best option is *not* to have special cases in the IOMMU
code for "those devices which can safely be assigned despite RMRRs".

Instead, let's let the device driver — or whatever — tell the IOMMU
code when it's *stopped* the firmware from (ab)using the device's DMA
facilities.

So when the USB code does the handoff thing to quiesce the firmware's
access to USB and take over in the OS, it would call the IOMMU function
to revoke the RMRR for the USB controller.

And if/when the graphics driver resets its device into a state where
it's no longer accessing stolen memory and can be assigned to a VM, it
can also call that 'RMRR revoke' function.

Likewise, if we teach device drivers to cancel whatever abominations
the HP firmware tends to set up behind the OS's back on other PCI
devices, we can cancel the RMRRs for those too.

Then the IOMMU code has a simple choice and no special cases — we can
assign a device iff it has no active RMRR.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 04/10] vring: Introduce vring_use_dma_api()

2016-02-01 Thread David Woodhouse
On Mon, 2016-02-01 at 07:39 -0800, Andy Lutomirski wrote:
> 
> >> Could we have an arch_vring_eschew_dma_api(dev) function which the
> >> affected architectures could provide (as a prelude to fixing it so that
> >> the DMA API does the right thing for *itself*)?
> >
> > I'm fine with this.
> 
> I modified vring_use_dma_api to take a vring_virtqueue* parameter to
> make this easier.
> 
> I'm a bit torn here.  I want to get the mechanism and the Xen part in,
> and there's unlikely to be much debate on those as a matter of
> principle.  I'd also like to flip as many arches over as possible, but
> that could be trickier.  Let me mull over this.

Let's queue the arch_vring_eschew_dma_api() thing up after this first
batch, and not hold it up any further.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 04/10] vring: Introduce vring_use_dma_api()

2016-02-01 Thread David Woodhouse
On Thu, 2016-01-28 at 18:31 -0800, Andy Lutomirski wrote:
> This is a kludge, but no one has come up with a a better idea yet.
> We'll introduce DMA API support guarded by vring_use_dma_api().
> Eventually we may be able to return true on more and more systems,
> and hopefully we can get rid of vring_use_dma_api() entirely some
> day.
> 
> Signed-off-by: Andy Lutomirski <l...@kernel.org>
> ---
>  drivers/virtio/virtio_ring.c | 24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index e12e385f7ac3..4b8dab4960bb 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -25,6 +25,30 @@
>  #include 
>  #include 
>  
> +/*
> + * The interaction between virtio and a possible IOMMU is a mess.
> + *
> + * On most systems with virtio, physical addresses match bus addresses,
> + * and it doesn't particularly matter whether we use the DMI API.
> + *
> + * On some sytems, including Xen and any system with a physical device
> + * that speaks virtio behind a physical IOMMU, we must use the DMA API
> + * for virtio DMA to work at all.
> + *
> + * On other systems, including SPARC and PPC64, virtio-pci devices are
> + * enumerated as though they are behind an IOMMU, but the virtio host
> + * ignores the IOMMU, so we must either pretend that the IOMMU isn't
> + * there or somehow map everything as the identity.
> + *
> + * For the time being, we preseve historic behavior and bypass the DMA
> + * API.
> + */

I spot at least three typos in there, FWIW. ('DMI API', 'sytems',
'preseve').

> +static bool vring_use_dma_api(void)
> +{
> + return false;
> +}
> +

I'd quite like to see this be an explicit opt-out for the known-broken
platforms. We've listed the SPARC and PPC64 issues. For x86 I need to
refresh my memory as a prelude to trying to fix it... was the issue
*just* that Qemu tends to ship with a broken BIOS that misdescribes the
virtio devices (and any assigned PCI devices) as being behind an IOMMU
when they're not, in the rare case that Qemu actually exposes its
partially-implemented virtual IOMMU to the guest?

Could we have an arch_vring_eschew_dma_api(dev) function which the
affected architectures could provide (as a prelude to fixing it so that
the DMA API does the right thing for *itself*)?

It would be functionally equivalent, but it would help to push the
workarounds to the right place — rather than entrenching them for ever
in tricky "OMG we need to audit what all the architectures do... let's
not touch it!" code.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 00/10] virtio DMA API, yet again

2016-01-29 Thread David Woodhouse
On Thu, 2016-01-28 at 18:31 -0800, Andy Lutomirski wrote:
> 
> To everyone else: we've waffled on this for way too long.  I think
> we should to get DMA API implementation in with a conservative
> policy like this rather than waiting until we achieve perfection.
> I'm tired of carrying these patches around.

Yeah, do it.

However,,, shouldn't the generic no-op DMA ops be checking the dma_mask
of the device and bitching if it can't reach the address in question?

Also, wasn't Christoph looking at making per-device DMA ops more
generic instead of an 'archdata' thing on basically every platform? Or
did I just imagine that part?

Not that I'm suggesting you make the s390 patch wait for that *instead*
of using archdata there, mind you. But I was kind of planning to let
the dust settle on this lot before I sort out the theoretical-except-
in-simulation issues with VT-d IOMMU covering *some* devices but not
all.

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel