Re: [Xen-devel] [PATCH 10/13] x86/alternative: Support indirect call replacement

2017-11-17 Thread H. Peter Anvin
On 10/04/17 08:58, Josh Poimboeuf wrote:
> Add alternative patching support for replacing an instruction with an
> indirect call.  This will be needed for the paravirt alternatives.

I have a patchset that generalizes the alternatives in what I think is a
more robust way.  I really, really want to get rid of these hacks.  Let
me clean it up an post it...

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-09-22 Thread H. Peter Anvin
On 09/22/17 11:57, Kees Cook wrote:
> On Fri, Sep 22, 2017 at 11:38 AM, H. Peter Anvin <h...@zytor.com> wrote:
>> We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
>> has RIP-relative addressing there is no need for a dedicated PIC register.
> 
> FWIW, since gcc 5, the PIC register isn't totally lost. It is now
> reusable, and that seems to have improved performance:
> https://gcc.gnu.org/gcc-5/changes.html

It still talks about a PIC register on x86-64, which confuses me.
Perhaps older gcc's would allocate a PIC register under certain
circumstances, and then lose it for the entire function?

For i386, the PIC register is required by the ABI to be %ebx at the
point any PLT entry is called.  Not an issue with -mno-plt which goes
straight to the GOT, although in most cases there needs to be a PIC
register to find the GOT unless load-time relocation is permitted.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-09-22 Thread H. Peter Anvin
On 09/22/17 09:32, Ingo Molnar wrote:
> 
> BTW., I think things improved with ORC because with ORC we have RBP as an 
> extra 
> register and with PIE we lose RBX - so register pressure in code generation 
> is 
> lower.
> 

We lose EBX on 32 bits, but we don't lose RBX on 64 bits - since x86-64
has RIP-relative addressing there is no need for a dedicated PIC register.

I'm somewhat confused how we can have as much as almost 1% overhead.  I
suspect that we end up making a GOT and maybe even a PLT for no good reason.

-hpa

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-09-22 Thread H. Peter Anvin
On 08/21/17 07:28, Peter Zijlstra wrote:
> 
> Ah, I see, this is large mode and that needs to use MOVABS to load 64bit
> immediates. Still, small RIP relative should be able to live at any
> point as long as everything lives inside the same 2G relative range, so
> would still allow the goal of increasing the KASLR range.
> 
> So I'm not seeing how we need large mode for that. That said, after
> reading up on all this, RIP relative will not be too pretty either,
> while CALL is naturally RIP relative, data still needs an explicit %rip
> offset, still loads better than the large model.
> 

The large model makes no sense whatsoever.  I think what we're actually
looking for is the small-PIC model.

Ingo asked:
> I.e. is there no GCC code generation mode where code can be placed anywhere 
> in the 
> canonical address space, yet call and jump distance is within 31 bits so that 
> the 
> generated code is fast?

That's the small-PIC model.  I think if all symbols are forced to hidden
then it won't even need a GOT/PLT.

We do need to consider how we want modules to fit into whatever model we
choose, though.  They can be adjacent, or we could go with a more
traditional dynamic link model where the modules can be separate, and
chained together with the main kernel via the GOT.

-hpa

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-27 Thread H. Peter Anvin
On 08/21/17 07:31, Peter Zijlstra wrote:
> On Tue, Aug 15, 2017 at 07:20:38AM -0700, Thomas Garnier wrote:
>> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar  wrote:
> 
>>> Have you considered a kernel with -mcmodel=small (or medium) instead of 
>>> -fpie
>>> -mcmodel=large? We can pick a random 2GB window in the (non-kernel) 
>>> canonical
>>> x86-64 address space to randomize the location of kernel text. The location 
>>> of
>>> modules can be further randomized within that 2GB window.
>>
>> -model=small/medium assume you are on the low 32-bit. It generates
>> instructions where the virtual addresses have the high 32-bit to be
>> zero.
> 
> That's a compiler fail, right? Because the SDM states that for "CALL
> rel32" the 32bit displacement is sign extended on x86_64.
> 

No.  It is about whether you can do something like:

movl $variable, %eax/* rax =  */

or

addl %ecx,variable(,%rsi,4) /* variable[rsi] += ecx */

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 06/22] kvm: Adapt assembly for PIE support

2017-07-19 Thread H. Peter Anvin
<paul.gortma...@windriver.com>,Chris Metcalf <cmetc...@mellanox.com>,"Paul E . 
McKenney" <paul...@linux.vnet.ibm.com>,Andrew Morton 
<a...@linux-foundation.org>,Christopher Li <spa...@chrisli.org>,Dou Liyang 
<douly.f...@cn.fujitsu.com>,Masahiro Yamada 
<yamada.masah...@socionext.com>,Daniel Borkmann <dan...@iogearbox.net>,Markus 
Trippelsdorf <mar...@trippelsdorf.de>,Peter Foley <pefol...@pefoley.com>,Steven 
Rostedt <rost...@goodmis.org>,Tim Chen <tim.c.c...@linux.intel.com>,Catalin 
Marinas <catalin.mari...@arm.com>,Matthew Wilcox 
<mawil...@microsoft.com>,Michal Hocko <mho...@suse.com>,Rob Landley 
<r...@landley.net>,Jiri Kosina <jkos...@suse.cz>,"H . J . Lu" 
<hjl.to...@gmail.com>,Paul Bolle <pebo...@tiscali.nl>,Baoquan He 
<b...@redhat.com>,Daniel Micay <danielmi...@gmail.com>,the arch/x86 maintainers 
<x...@kernel.org>,"linux-cry...@vger.kernel.org" 
<linux-cry...@vger.kernel.org>,Linux Kernel Mailing List 
<linux-ker...@vger.kernel.org>,xen-de...@lists.xenproject.org,kvm list
<k...@vger.kernel.org>,linux-pm <linux...@vger.kernel.org>,linux-arch 
<linux-a...@vger.kernel.org>,Linux-Sparse <linux-spa...@vger.kernel.org>,Kernel 
Hardening <kernel-harden...@lists.openwall.com>
From: h...@zytor.com
Message-ID: <83ba7600-bc8d-4c91-812c-dd2a0bf44...@zytor.com>

On July 19, 2017 3:58:07 PM PDT, Ard Biesheuvel <ard.biesheu...@linaro.org> 
wrote:
>On 19 July 2017 at 23:27, H. Peter Anvin <h...@zytor.com> wrote:
>> On 07/19/17 08:40, Thomas Garnier wrote:
>>>>
>>>> This doesn't look right.  It's accessing a per-cpu variable.  The
>>>> per-cpu section is an absolute, zero-based section and not subject
>to
>>>> relocation.
>>>
>>> PIE does not respect the zero-based section, it tries to have
>>> everything relative. Patch 16/22 also adapt per-cpu to work with PIE
>>> (while keeping the zero absolute design by default).
>>>
>>
>> This is silly.  The right thing is for PIE is to be explicitly
>absolute,
>> without (%rip).  The use of (%rip) memory references for percpu is
>just
>> an optimization.
>>
>
>Sadly, there is an issue in binutils that may prevent us from doing
>this as cleanly as we would want.
>
>For historical reasons, bfd.ld emits special symbols like
>__GLOBAL_OFFSET_TABLE__ as absolute symbols with a section index of
>SHN_ABS, even though it is quite obvious that they are relative like
>any other symbol that points into the image. Unfortunately, this means
>that binutils needs to emit R_X86_64_RELATIVE relocations even for
>SHN_ABS symbols, which means we lose the ability to use both absolute
>and relocatable symbols in the same PIE image (unless the reloc tool
>can filter them out)
>
>More info here:
>https://sourceware.org/bugzilla/show_bug.cgi?id=19818

The reloc tool already has the ability to filter symbols.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 20/22] x86/relocs: Add option to generate 64-bit relocations

2017-07-19 Thread H. Peter Anvin
<cmetc...@mellanox.com>,"Paul E . McKenney" <paul...@linux.vnet.ibm.com>,Andrew 
Morton <a...@linux-foundation.org>,Christopher Li <spa...@chrisli.org>,Dou 
Liyang <douly.f...@cn.fujitsu.com>,Masahiro Yamada 
<yamada.masah...@socionext.com>,Daniel Borkmann <dan...@iogearbox.net>,Markus 
Trippelsdorf <mar...@trippelsdorf.de>,Peter Foley <pefol...@pefoley.com>,Steven 
Rostedt <rost...@goodmis.org>,Tim Chen <tim.c.c...@linux.intel.com>,Ard 
Biesheuvel <ard.biesheu...@linaro.org>,Catalin Marinas 
<catalin.mari...@arm.com>,Matthew Wilcox <mawil...@microsoft.com>,Michal Hocko 
<mho...@suse.com>,Rob Landley <r...@landley.net>,Jiri Kosina 
<jkos...@suse.cz>,"H . J . Lu" <hjl.to...@gmail.com>,Paul Bolle 
<pebo...@tiscali.nl>,Baoquan He <b...@redhat.com>,Daniel Micay 
<danielmi...@gmail.com>,the arch/x86 maintainers 
<x...@kernel.org>,linux-cry...@vger.kernel.org,LKML 
<linux-ker...@vger.kernel.org>,xen-de...@lists.xenproject.org,kvm list 
<k...@vger.kernel.org>,Linux PM list
<linux...@vger.kernel.org>,linux-arch 
<linux-a...@vger.kernel.org>,linux-spa...@vger.kernel.org,Kernel Hardening 
<kernel-harden...@lists.openwall.com>
From: h...@zytor.com
Message-ID: <0ef6faaa-a99c-4f0d-9e4a-ad25e9395...@zytor.com>

On July 19, 2017 4:25:56 PM PDT, Thomas Garnier <thgar...@google.com> wrote:
>On Wed, Jul 19, 2017 at 4:08 PM, H. Peter Anvin <h...@zytor.com> wrote:
>> On 07/19/17 15:47, Thomas Garnier wrote:
>>> On Wed, Jul 19, 2017 at 3:33 PM, H. Peter Anvin <h...@zytor.com>
>wrote:
>>>> On 07/18/17 15:33, Thomas Garnier wrote:
>>>>> The x86 relocation tool generates a list of 32-bit signed
>integers. There
>>>>> was no need to use 64-bit integers because all addresses where
>above the 2G
>>>>> top of the memory.
>>>>>
>>>>> This change add a large-reloc option to generate 64-bit unsigned
>integers.
>>>>> It can be used when the kernel plan to go below the top 2G and
>32-bit
>>>>> integers are not enough.
>>>>
>>>> Why on Earth?  This would only be necessary if the *kernel itself*
>was
>>>> more than 2G, which isn't going to happen for the forseeable
>future.
>>>
>>> Because the relocation integer is an absolute address, not an offset
>>> in the binary. Next iteration, I can try using a 32-bit offset for
>>> everyone.
>>
>> It is an absolute address *as the kernel was originally linked*, for
>> obvious reasons.
>
>Sure when the kernel was just above 0x8000, it doesn't
>work when it goes down to 0x. That's why using an
>offset might make more sense in general.
>
>>
>> -hpa
>>

What is the motivation for changing the pre linked address at all?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 16/22] x86/percpu: Adapt percpu for PIE support

2017-07-19 Thread H. Peter Anvin
On 07/19/17 19:21, H. Peter Anvin wrote:
> On 07/19/17 16:33, H. Peter Anvin wrote:
>>>
>>> I agree that it is odd but that's how the compiler generates code. I
>>> will re-explore PIC options with mcmodel=small or medium, as mentioned
>>> on other threads.
>>
>> Why should the way compiler generates code affect the way we do things
>> in assembly?
>>
>> That being said, the compiler now has support for generating this kind
>> of code explicitly via the __seg_gs pointer modifier.  That should let
>> us drop the __percpu_prefix and just use variables directly.  I suspect
>> we want to declare percpu variables as "volatile __seg_gs" to account
>> for the possibility of CPU switches.
>>
>> Older compilers won't be able to work with this, of course, but I think
>> that it is acceptable for those older compilers to not be able to
>> support PIE.
>>
> 
> Grump.  It turns out that the compiler doesn't do the right thing for
> symbols marked with the __seg_[fg]s markers.  __thread does the right
> thing, but __thread a) has %fs: hard-coded, still, and b) I believe can
> still cache %seg:0 arbitrarily long.

I filed this bug report for gcc:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81490

It might still be possible to work around this by playing really ugly
games with __thread, but I haven't yet figured out how best to do that.

-hpa

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 16/22] x86/percpu: Adapt percpu for PIE support

2017-07-19 Thread H. Peter Anvin
On 07/19/17 16:33, H. Peter Anvin wrote:
>>
>> I agree that it is odd but that's how the compiler generates code. I
>> will re-explore PIC options with mcmodel=small or medium, as mentioned
>> on other threads.
> 
> Why should the way compiler generates code affect the way we do things
> in assembly?
> 
> That being said, the compiler now has support for generating this kind
> of code explicitly via the __seg_gs pointer modifier.  That should let
> us drop the __percpu_prefix and just use variables directly.  I suspect
> we want to declare percpu variables as "volatile __seg_gs" to account
> for the possibility of CPU switches.
> 
> Older compilers won't be able to work with this, of course, but I think
> that it is acceptable for those older compilers to not be able to
> support PIE.
> 

Grump.  It turns out that the compiler doesn't do the right thing for
symbols marked with the __seg_[fg]s markers.  __thread does the right
thing, but __thread a) has %fs: hard-coded, still, and b) I believe can
still cache %seg:0 arbitrarily long.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 16/22] x86/percpu: Adapt percpu for PIE support

2017-07-19 Thread H. Peter Anvin
On 07/19/17 11:26, Thomas Garnier wrote:
> On Tue, Jul 18, 2017 at 8:08 PM, Brian Gerst  wrote:
>> On Tue, Jul 18, 2017 at 6:33 PM, Thomas Garnier  wrote:
>>> Perpcu uses a clever design where the .percu ELF section has a virtual
>>> address of zero and the relocation code avoid relocating specific
>>> symbols. It makes the code simple and easily adaptable with or without
>>> SMP support.
>>>
>>> This design is incompatible with PIE because generated code always try to
>>> access the zero virtual address relative to the default mapping address.
>>> It becomes impossible when KASLR is configured to go below -2G. This
>>> patch solves this problem by removing the zero mapping and adapting the GS
>>> base to be relative to the expected address. These changes are done only
>>> when PIE is enabled. The original implementation is kept as-is
>>> by default.
>>
>> The reason the per-cpu section is zero-based on x86-64 is to
>> workaround GCC hardcoding the stack protector canary at %gs:40.  So
>> this patch is incompatible with CONFIG_STACK_PROTECTOR.
> 
> Ok, that make sense. I don't want this feature to not work with
> CONFIG_CC_STACKPROTECTOR*. One way to fix that would be adding a GDT
> entry for gs so gs:40 points to the correct memory address and
> gs:[rip+XX] works correctly through the MSR.

What are you talking about?  A GDT entry and the MSR do the same thing,
except that a GDT entry is limited to an offset of 0-0x (which
doesn't work for us, obviously.)

> Given the separate
> discussion on mcmodel, I am going first to check if we can move from
> PIE to PIC with a mcmodel=small or medium that would remove the percpu
> change requirement. I tried before without success but I understand
> better percpu and other components so maybe I can make it work.

>> This is silly.  The right thing is for PIE is to be explicitly absolute,
>> without (%rip).  The use of (%rip) memory references for percpu is just
>> an optimization.
> 
> I agree that it is odd but that's how the compiler generates code. I
> will re-explore PIC options with mcmodel=small or medium, as mentioned
> on other threads.

Why should the way compiler generates code affect the way we do things
in assembly?

That being said, the compiler now has support for generating this kind
of code explicitly via the __seg_gs pointer modifier.  That should let
us drop the __percpu_prefix and just use variables directly.  I suspect
we want to declare percpu variables as "volatile __seg_gs" to account
for the possibility of CPU switches.

Older compilers won't be able to work with this, of course, but I think
that it is acceptable for those older compilers to not be able to
support PIE.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 20/22] x86/relocs: Add option to generate 64-bit relocations

2017-07-19 Thread H. Peter Anvin
On 07/19/17 15:47, Thomas Garnier wrote:
> On Wed, Jul 19, 2017 at 3:33 PM, H. Peter Anvin <h...@zytor.com> wrote:
>> On 07/18/17 15:33, Thomas Garnier wrote:
>>> The x86 relocation tool generates a list of 32-bit signed integers. There
>>> was no need to use 64-bit integers because all addresses where above the 2G
>>> top of the memory.
>>>
>>> This change add a large-reloc option to generate 64-bit unsigned integers.
>>> It can be used when the kernel plan to go below the top 2G and 32-bit
>>> integers are not enough.
>>
>> Why on Earth?  This would only be necessary if the *kernel itself* was
>> more than 2G, which isn't going to happen for the forseeable future.
> 
> Because the relocation integer is an absolute address, not an offset
> in the binary. Next iteration, I can try using a 32-bit offset for
> everyone.

It is an absolute address *as the kernel was originally linked*, for
obvious reasons.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 07/22] x86: relocate_kernel - Adapt assembly for PIE support

2017-07-19 Thread H. Peter Anvin
On 07/18/17 15:33, Thomas Garnier wrote:
> Change the assembly code to use only relative references of symbols for the
> kernel to be PIE compatible.
> 
> Position Independent Executable (PIE) support will allow to extended the
> KASLR randomization range below the -2G memory limit.
> 
> Signed-off-by: Thomas Garnier 
> ---
>  arch/x86/kernel/relocate_kernel_64.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S 
> b/arch/x86/kernel/relocate_kernel_64.S
> index 98111b38ebfd..da817d1628ac 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -186,7 +186,7 @@ identity_mapped:
>   movq%rax, %cr3
>   lea PAGE_SIZE(%r8), %rsp
>   callswap_pages
> - movq$virtual_mapped, %rax
> + leaqvirtual_mapped(%rip), %rax
>   pushq   %rax
>   ret
>  

This is completely wrong.  The whole point is that %rip here is on an
identity-mapped page, which means that its offset to the actual symbol
is ill-defined.

The use of pushq/ret to do an indirect jump is bizarre, though, instead of:

pushq %r8
ret

one ought to simply do

jmpq *%r8

I think the author of this code was confused by the fact that we have to
use this construct to do a *far* jump.

There are some other very bizarre constructs in this file, that I can
only assume comes from clumsy porting from 32 bits, for example:

call 1f
1:
popq %r8
subq $(1b - relocate_kernel), %r8

... instead of the much simpler ...

leaq relocate_kernel(%rip), %r8

With this value in %r8 anyway, you can simply do:

leaq (virtual_mapped - relocate_kernel)(%r8), %rax
jmpq *%rax

This patchset scares me.  There seems to be a lot of places where you
have not been very aware of what is actually happening in the code, nor
have done research about how the ABIs actually work and affect things.

Sorry.

-hpa

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 20/22] x86/relocs: Add option to generate 64-bit relocations

2017-07-19 Thread H. Peter Anvin
On 07/18/17 15:33, Thomas Garnier wrote:
> The x86 relocation tool generates a list of 32-bit signed integers. There
> was no need to use 64-bit integers because all addresses where above the 2G
> top of the memory.
> 
> This change add a large-reloc option to generate 64-bit unsigned integers.
> It can be used when the kernel plan to go below the top 2G and 32-bit
> integers are not enough.

Why on Earth?  This would only be necessary if the *kernel itself* was
more than 2G, which isn't going to happen for the forseeable future.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 06/22] kvm: Adapt assembly for PIE support

2017-07-19 Thread H. Peter Anvin
On 07/19/17 08:40, Thomas Garnier wrote:
>>
>> This doesn't look right.  It's accessing a per-cpu variable.  The
>> per-cpu section is an absolute, zero-based section and not subject to
>> relocation.
> 
> PIE does not respect the zero-based section, it tries to have
> everything relative. Patch 16/22 also adapt per-cpu to work with PIE
> (while keeping the zero absolute design by default).
> 

This is silly.  The right thing is for PIE is to be explicitly absolute,
without (%rip).  The use of (%rip) memory references for percpu is just
an optimization.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 21/22] x86/module: Add support for mcmodel large and PLTs

2017-07-18 Thread H. Peter Anvin
On 07/18/17 15:33, Thomas Garnier wrote:
> With PIE support and KASLR extended range, the modules may be further
> away from the kernel than before breaking mcmodel=kernel expectations.
> 
> Add an option to build modules with mcmodel=large. The modules generated
> code will make no assumptions on placement in memory.
> 
> Despite this option, modules still expect kernel functions to be within
> 2G and generate relative calls. To solve this issue, the PLT arm64 code
> was adapted for x86_64. When a relative relocation go outside its range,
> a dynamic PLT entry is used to correctly jump to the destination.

Why large as opposed to medium or medium-PIC?

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 3/3] x86: Make the GDT remapping read-only on 64-bit

2017-03-14 Thread H. Peter Anvin
,"Luis R . Rodriguez" ,Stanislaw Gruszka 
,Peter Zijlstra ,Josh Poimboeuf 
,Vitaly Kuznetsov ,Tim Chen 
,Joerg Roedel 
,=?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= ,the 
arch/x86 maintainers ,LKML 
,linux-...@vger.kernel.org,kasan-dev 
,Linux-MM ,Linux PM list 
,linux-...@vger.kernel.org,xen-de...@lists.xenproject.org,lgu...@lists.ozlabs.org,kvm
 list ,Kernel Hardening 

From: h...@zytor.com
Message-ID: <550f6209-025a-45e2-84e2-f00a3771c...@zytor.com>

On March 14, 2017 2:20:19 PM PDT, Thomas Garnier  wrote:
>On Tue, Mar 14, 2017 at 2:04 PM, Pavel Machek  wrote:
>> On Tue 2017-03-14 10:05:08, Thomas Garnier wrote:
>>> This patch makes the GDT remapped pages read-only to prevent
>corruption.
>>> This change is done only on 64-bit.
>>>
>>> The native_load_tr_desc function was adapted to correctly handle a
>>> read-only GDT. The LTR instruction always writes to the GDT TSS
>entry.
>>> This generates a page fault if the GDT is read-only. This change
>checks
>>> if the current GDT is a remap and swap GDTs as needed. This function
>was
>>> tested by booting multiple machines and checking hibernation works
>>> properly.
>>>
>>> KVM SVM and VMX were adapted to use the writeable GDT. On VMX, the
>>> per-cpu variable was removed for functions to fetch the original
>GDT.
>>> Instead of reloading the previous GDT, VMX will reload the fixmap
>GDT as
>>> expected. For testing, VMs were started and restored on multiple
>>> configurations.
>>>
>>> Signed-off-by: Thomas Garnier 
>>
>> Can we get the same change for 32-bit, too? Growing differences
>> between 32 and 64 bit are a bit of a problem...
>> Pavel
>
>It was discussed on previous versions that 32-bit read-only support
>would create issues that why it was favor for 64-bit only right now.
>
>>
>> --
>> (english) http://www.livejournal.com/~pavelmachek
>> (cesky, pictures)
>http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

We can't make the GDT read-only on 32 bits since we use task switches for 
last-resort recovery.  64 bits has IST instead.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 4/4] x86/asm: Rewrite sync_core() to use IRET-to-self

2016-12-06 Thread H. Peter Anvin
On 12/06/16 00:46, Jan Beulich wrote:
>> +
>> +#ifdef CONFIG_X86_32
>> +asm volatile (
>> +"pushfl\n\t"
>> +"pushl %%cs\n\t"
>> +"pushl $1f\n\t"
>> +"iret\n\t"
>> +"1:"
>> +: "+r" (__sp) : : "cc", "memory");
> 
> I don't thing EFLAGS (i.e. "cc") gets modified anywhere here. And
> the memory clobber would perhaps better be pulled out into an
> explicit barrier() invocation (making it more obvious what it's needed
> for)?
> 

Not to mention "cc" doesn't do anything on x86 at all.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/3] x86 Test and expose CPUID faulting capabilities in /proc/cpuinfo

2016-09-15 Thread H. Peter Anvin
On September 14, 2016 6:17:51 PM PDT, Andy Lutomirski  
wrote:
>On Wed, Sep 14, 2016 at 3:03 PM, Kyle Huey  wrote:
>> On Wed, Sep 14, 2016 at 2:35 PM, Dave Hansen
>>  wrote:
>>> On 09/14/2016 02:01 PM, Kyle Huey wrote:
>
>>> Is any of this useful to optimize away at compile-time?  We have
>config
>>> options for when we're running as a guest, and this seems like a
>feature
>>> that isn't available when running on bare metal.
>>
>> On the contrary, this is only available when we're on bare metal.
>> Neither Xen nor KVM virtualize CPUID faulting (although KVM correctly
>> suppresses MSR_PLATFORM_INFO's report of support for it).
>
>KVM could easily support this.  If rr starts using it, I think KVM
>*should* add support, possibly even for older CPUs that don't support
>the feature in hardware.
>
>It's too bad that x86 doesn't give us the instruction bytes on a
>fault.  Otherwise we could lazily switch this feature.
>
>--Andy

You can "always" examine the instruction bytes in memory... have to make sure 
you properly consider the impact of race conditions though.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 7/7] tools: add userspace linker table sandbox

2016-08-22 Thread H. Peter Anvin
Vrabel ,Konrad Rzeszutek Wilk 
,Michael Brown ,Juergen Gross 
,Andrew Cooper ,Andy Shevchenko 
,Paul Gortmaker 
,"xen-de...@lists.xensource.com" 
,Andi Kleen 
,pali.ro...@gmail.com,dvh...@infradead.org,platform-driver-...@vger.kernel.org,Michal
 Marek ,Rasmus Villemoes ,Jiri 
Kosina ,=?UTF-8?B?7KGw6rK966+8?= 
,linux-kbuild ,Tony Luck 
,Andrew Morton 
,linux-i...@vger.kernel.org,"linux-arm-ker...@lists.infradead.org"
 ,linux-sh 
,sparclinux ,Catalin 
Marinas ,Will Deacon ,Ste!
 ven
Rostedt ,Jani Nikula ,Mauro 
Carvalho Chehab 
,markus.hei...@darmarit.de,jo...@kernel.org,Mark 
Salter ,Chris Zankel ,Max Filippov 
,linux-xte...@linux-xtensa.org,Paul Mackerras 
,Michael Ellerman ,James Bottomley 

Message-ID: <58f06cea-aeb3-4ed7-8211-95f402c9d...@zytor.com>

On August 22, 2016 5:07:39 PM PDT, "Luis R. Rodriguez"  
wrote:
>On Fri, Aug 19, 2016 at 03:31:47PM -0700, Kees Cook wrote:
>> On Fri, Aug 19, 2016 at 2:41 PM,   wrote:
>> > From: "Luis R. Rodriguez" 
>> >
>> > Add a userspace sandbox to allow easy experimentation and
>> > test extensions with linker tables, section ranges and the
>> > new section core definitions.
>> >
>> > The userspace sandbox tries to mimic the Linux kernel development
>> > flow as much as possible, it however relies on and uses libc.
>Support
>> > is currently only provided to x86_64.
>> >
>> > v4: this patch is new in this series -- added to the kenrel as
>> > suggested by Boris, as otherwise it'd be really hard to keep
>> > an external userspace repository in sync.
>> >
>> > Signed-off-by: Luis R. Rodriguez 
>> > ---
>> >  Documentation/sections/linker-tables.rst   |   4 +-
>> >  MAINTAINERS|   1 +
>> >  include/linux/tables.h |   5 +-
>> >  tools/Makefile |   3 +-
>> >  .../arch/x86/include/generated/asm/section-core.h  |   1 +
>> >  tools/arch/x86/include/generated/ranges.h  |   1 +
>> >  tools/arch/x86/include/generated/tables.h  |   1 +
>> >  tools/include/asm-generic/ranges.h | 103 
>> >  tools/include/asm-generic/section-core.h   | 341
>+++
>> >  tools/include/asm-generic/tables.h |  50 ++
>> 
>> Aren't a bunch of these files exact duplicates of the headers in
>include/linux?
>
>Indeed... This a userspace tools/ architecture decision that was made
>long ago,
>so its not up to me, I am just following the strategy devised and
>picked up.
>Refer to 7d7d1bf1d1dabe435ef50efb051724b8664749cb ("perf bench: Copy
>kernel
>files needed to build mem{cpy,set} x86_64 benchmarks") for an example
>of
>previous similar work. By sharing header files this enable more tools/
>to be hacked on.
>
>  Luis

I think this is a legacy from before the uapi change that should really be 
fixed.  If we need to export additional kernel structures for the tools, we 
could define a third level of we really need it.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 0/7] tools: add linker table userspace sandbox

2016-08-21 Thread H. Peter Anvin
mcg...@kernel.org,a...@redhat.com,t...@linutronix.de,mi...@redhat.com,jpoim...@redhat.com,b...@alien8.de,li...@arm.linux.org.uk,mhira...@kernel.org,masami.hiramatsu...@hitachi.com,jba...@akamai.com,heiko.carst...@de.ibm.com,ana...@linux.vnet.ibm.com,anil.s.keshavamur...@intel.com,da...@davemloft.net,real...@gmail.com,x...@kernel.org,l...@amacapital.net,keesc...@chromium.org,torva...@linux-foundation.org,gre...@linuxfoundation.org,ru...@rustcorp.com.au,gno...@lxorguk.ukuu.org.uk,a...@linux.intel.com,dw...@infradead.org,a...@arndb.de,ming@canonical.com,linux-a...@vger.kernel.org,b...@kernel.crashing.org,ana...@in.ibm.com,pebo...@tiscali.nl,font...@sharpeleven.org,david.vra...@citrix.com,konrad.w...@oracle.com,mc...@ipxe.org,jgr...@suse.com,andrew.coop...@citrix.com,andriy.shevche...@linux.intel.com,paul.gortma...@windriver.com,xen-de...@lists.xensource.com,a...@linux.intel.com,pali.ro...@gmail.com,dvh...@infradead.org,platform-driver-...@vger.kernel.org,mma...@suse.com,linux@ra!
 smusville
moes.dk,jkos...@suse.cz,korea.dr...@gmail.com,linux-kbu...@vger.kernel.org,tony.l...@intel.com,a...@linux-foundation.org,linux-i...@vger.kernel.org,linux-arm-ker...@lists.infradead.org,linux...@vger.kernel.org,sparcli...@vger.kernel.org,catalin.mari...@arm.com,will.dea...@arm.com,rost...@goodmis.org,jani.nik...@intel.com,mche...@osg.samsung.com,markus.hei...@darmarit.de,jo...@kernel.org,msal...@redhat.com,ch...@zankel.net,jcmvb...@gmail.com,linux-xte...@linux-xtensa.org,pau...@samba.org,m...@ellerman.id.au,james.bottom...@hansenpartnership.com
Message-ID: <5150efe4-3e00-4bd2-ae6f-f99fbbc74...@zytor.com>

On August 20, 2016 9:59:59 PM PDT, Rich Felker  wrote:
>On Fri, Aug 19, 2016 at 11:57:18PM -0500, Rob Landley wrote:
>> On 08/19/2016 04:41 PM, mcg...@kernel.org wrote:
>> > Please let me know if there are any issue or questions.
>> 
>> Only that this has been the majority of the traffic on the linux-sh
>> mailing list for over a month and I'm still not sure why anyone
>should care.
>> 
>> I have no idea what problem it solves, despite reading a couple dozen
>> messages in the thread, and the most recent two 0/x intro messages.
>Its
>> purpose seems to be ensuring that lld.llvm.org has more work to do if
>it
>> ever wants to build the kernel without binutils?
>> 
>> I also am not certain why every revision of it is cc'd to linux-sh.
>Is
>> it generic linker infrastructure change, or is it something that
>affects
>> this architecture specifically? As far as I can tell nothing in this
>> most recent 7-patch series touches arch/sh at all, you just cc'd our
>> list because you think the work you're doing is _important_, not that
>> it's specifically relevant to us.
>
>Incidentally I'm happy to have been CC'd since this infrastructure is
>_really_ nice for doing things generically with device tree. The
>ability to add new linker-section-based tables without having to
>manually hack up linker script templates will make it so we can do
>things like adding tables for cache controllers or mmus (that are
>needed quite early in init and can't go through the platform device
>system).
>
>BTW we kinda lucked out that there was already the linker table
>infrastructure for cpu enable methods for smp; this patch series makes
>it so future stuff doesn't have to rely on luck or invasive changes.
>
>Rich

Incidentally, I want to use this for the RAID algorithms.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v3 07/13] tables.h: add linker table support

2016-07-28 Thread H. Peter Anvin
t...@linutronix.de,mi...@redhat.com,b...@alien8.de,li...@arm.linux.org.uk,masami.hiramatsu...@hitachi.com,jba...@akamai.com,heiko.carst...@de.ibm.com,ana...@linux.vnet.ibm.com,anil.s.keshavamur...@intel.com,da...@davemloft.net,real...@gmail.com,x...@kernel.org,l...@amacapital.net,keesc...@chromium.org,torva...@linux-foundation.org,gre...@linuxfoundation.org,ru...@rustcorp.com.au,gno...@lxorguk.ukuu.org.uk,a...@linux.intel.com,dw...@infradead.org,a...@arndb.de,ming@canonical.com,linux-a...@vger.kernel.org,b...@kernel.crashing.org,ana...@in.ibm.com,pebo...@tiscali.nl,font...@sharpeleven.org,ciaran.farr...@suse.com,christopher.denic...@suse.com,david.vra...@citrix.com,konrad.w...@oracle.com,mc...@ipxe.org,jgr...@suse.com,andrew.coop...@citrix.com,andriy.shevche...@linux.intel.com,paul.gortma...@windriver.com,xen-de...@lists.xensource.com,a...@linux.intel.com,pali.ro...@gmail.com,dvh...@infradead.org,platform-driver-...@vger.kernel.org,mma...@suse.com,li...@rasmusvillemoes.dk,jko!
 sina@suse
.cz,korea.dr...@gmail.com,linux-kbu...@vger.kernel.org,tony.l...@intel.com,a...@linux-foundation.org,linux-i...@vger.kernel.org,linux-arm-ker...@lists.infradead.org,linux...@vger.kernel.org,sparcli...@vger.kernel.org,catalin.mari...@arm.com,will.dea...@arm.com,rost...@goodmis.org,jpoim...@redhat.com
Message-ID: <01fd20b1-e788-4cc6-81cf-ba26f000f...@zytor.com>

On July 27, 2016 4:02:18 PM PDT, "Luis R. Rodriguez"  wrote:
>On Tue, Jul 26, 2016 at 12:30:14AM +0900, Masami Hiramatsu wrote:
>> On Fri, 22 Jul 2016 14:24:41 -0700
>> "Luis R. Rodriguez"  wrote:
>> 
>> > +/**
>> > + * LINKTABLE_RUN_ALL - iterate and run through all entries on a
>linker table
>> > + *
>> > + * @tbl: linker table
>> > + * @func: structure name for the function name we want to call.
>> > + * @args...: arguments to pass to func
>> > + *
>> > + * Example usage:
>> > + *
>> > + *   LINKTABLE_RUN_ALL(frobnicator_fns, some_run,);
>> > + */
>> > +#define LINKTABLE_RUN_ALL(tbl, func, args...) 
>> > \
>> > +do {  
>> > \
>> > +  size_t i;   \
>> > +  for (i = 0; i < LINUX_SECTION_SIZE(tbl); i++)   \
>> > +  (tbl[i]).func (args);   \
>> > +} while (0);
>> > +
>> > +/**
>> > + * LINKTABLE_RUN_ERR - run each linker table entry func and return
>error if any
>> > + *
>> > + * @tbl: linker table
>> > + * @func: structure name for the function name we want to call.
>> > + * @args...: arguments to pass to func
>> > + *
>> > + * Example usage:
>> > + *
>> > + *   unsigned int err = LINKTABLE_RUN_ERR(frobnicator_fns,
>some_run,);
>> > + */
>> > +#define LINKTABLE_RUN_ERR(tbl, func, args...) 
>> > \
>> > +({
>> > \
>> > +  size_t i;   \
>> > +  int err = 0;\
>> > +  for (i = 0; !err && i < LINUX_SECTION_SIZE(tbl); i++)   \
>> > +  err = (tbl[i]).func (args); \
>> > +  err; \
>> > +})
>> 
>> These iteration APIs are a bit dangerous, at least for these APIs
>we'd better change
>> name like as FUNCTABLE_RUN etc. because LINKTABLE can contain not
>only function address
>> but also some data (or address of data).
>
>Sure will do, thanks for the review.
>
>  Luis

I don't know if they are dangerous.  Keep in mind C type checking is still 
present.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [tip:x86/asm] x86/mm/xen: Suppress hugetlbfs in PV guests

2016-04-22 Thread H. Peter Anvin
On 04/22/2016 02:47 AM, tip-bot for Jan Beulich wrote:
> Commit-ID:  103f6112f253017d7062cd74d17f4a514ed4485c
> Gitweb: http://git.kernel.org/tip/103f6112f253017d7062cd74d17f4a514ed4485c
> Author: Jan Beulich 
> AuthorDate: Thu, 21 Apr 2016 00:27:04 -0600
> Committer:  Ingo Molnar 
> CommitDate: Fri, 22 Apr 2016 10:05:00 +0200
> 
> 
> diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
> index f8a29d2..e6a8613 100644
> --- a/arch/x86/include/asm/hugetlb.h
> +++ b/arch/x86/include/asm/hugetlb.h
> @@ -4,6 +4,7 @@
>  #include 
>  #include 
>  
> +#define hugepages_supported() cpu_has_pse
>  

Please don't use the cpu_has_* macros anymore, they are going away soon.

In this case it should be static_cpu_has(X86_FEATURE_PSE).

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Intel MID / CE4100 - platform support - pnpbios support ?

2016-04-04 Thread H. Peter Anvin
On 04/04/16 11:24, Luis R. Rodriguez wrote:
> On Mon, Apr 04, 2016 at 09:01:06AM -0700, H. Peter Anvin wrote:
>> On 03/31/16 13:03, Luis R. Rodriguez wrote:
>>> Andy S, Peter, Thomas, Jiang (or who might know),
>>>
>>> Do Intel MID platforms exist with PNP BIOS support? What abot CE4100?
>>> As it stands I don't see anything that would prevent this but I would
>>> suspect a possibility might be that it doesn't. I'm sanitizing some
>>> early boot code right now and pnpbios is one, and as I work on this,
>>> this has come up as a question for me.
>>>
>>
>> The "MID" platforms from a Linux platform perspective are the ones with
>> SFI and DT bootloaders, respectively; by definition they don't have
>> standard BIOS.
> 
> I see thanks, I ask as I'm currently removing a pnpbios replacing a
> paravirt_enabled() check to a more general disable pnpbios x86 platform quirk
> option, so far lguest and xen would use it but it would seem to me it was
> worthy to ask if if MID and CE4100 subarchs also could have this set as well
> then.
> 
> It sounds like then at least for Intel MID I can disable pnpbios as well
> as a quirk as well. I can introduce that change separately in my series.
> 
> What about CE4100 ?
> 

The same, I am pretty sure.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Intel MID / CE4100 - platform support - pnpbios support ?

2016-04-04 Thread H. Peter Anvin
On 03/31/16 13:03, Luis R. Rodriguez wrote:
> Andy S, Peter, Thomas, Jiang (or who might know),
> 
> Do Intel MID platforms exist with PNP BIOS support? What abot CE4100?
> As it stands I don't see anything that would prevent this but I would
> suspect a possibility might be that it doesn't. I'm sanitizing some
> early boot code right now and pnpbios is one, and as I work on this,
> this has come up as a question for me.
> 

The "MID" platforms from a Linux platform perspective are the ones with
SFI and DT bootloaders, respectively; by definition they don't have
standard BIOS.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 4/7] asm/sections: add a generic push_section_tbl()

2016-02-21 Thread H. Peter Anvin
On 02/19/16 13:06, Luis R. Rodriguez wrote:
>>
>> I think the \n\t is unnecessary.
> 
> Super! I wonder if we we can just use this on s390 as well without it pooping?
> I ask as this would set a precedent.
> 

Ask Heike, but I think just ; or \n ought be be fine.  I do not know of
*any* case where \t at the end of a string would ever be necessary, and
it would *always* be possible to replace it with a space in a pinch.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 2/7] tables.h: add linker table support

2016-02-19 Thread H. Peter Anvin
On 02/19/2016 05:45 AM, Luis R. Rodriguez wrote:
> +/**
> + * LINKTABLE_RUN_ERR - run each linker table entry func and return error if 
> any
> + *
> + * @tbl: linker table
> + * @func: structure name for the function name we want to call.
> + * @args...: arguments to pass to func
> + *
> + * Example usage:
> + *
> + *   unsigned int err = LINKTABLE_RUN_ERR(frobnicator_fns, some_run,);
> + */
> +#define LINKTABLE_RUN_ERR(tbl, func, args...)
> \
> +({   \
> + size_t i;   \
> + int err = 0;\
> + for (i = 0; !err && i < LINKTABLE_SIZE(tbl); i++)   \
> + err = (tbl[i]).func (args); \
> + err; \
> +})

This is wrong and pointless.  As written it returns the error code of
the last instance.

What I suggested for this macro was that we ought to exit the loop on error.

Furthermore:

1. Using an advancing pointer would make more sense than a counter.
2. .func doesn't make any sense -- it ought to be "func"; otherwise you
can't call into either a table containing pure function pointers,
nor into a field of table *pointed to* by the table; which is
likely to be quite common.

So the idea is to use something like:

/* Array of function pointers */
LINKTABLE_RUN_ALL(frobnicator_fns, , arg1, arg2);

/* Array of structures */
LINKTABLE_RUN_ALL(frobnicator_fns, .some_run, arg1, arg2);

/* Array of structure pointers */
LINKTABLE_RUN_ALL(frobnicator_fns, ->some_run, arg1, arg2);


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 2/7] tables.h: add linker table support

2016-02-19 Thread H. Peter Anvin
On 02/19/2016 05:45 AM, Luis R. Rodriguez wrote:
> +
> +/**
> + * DOC: Regular linker linker table constructors
> + *
> + * Regular constructors are expected to be used for valid linker table 
> entries.
> + * Valid uses of weak entries other than the beginning and is currently
> + * untested but should in theory work.
> + */
> +
> +/**
> + * LINKTABLE_TEXT - Declares a linker table entry for execution
> + *
> + * @name: linker table name
> + * @level: order level
> + *
> + * Declares a linker table to be used for execution.
> + */
> +#define LINKTABLE_TEXT(name, level)  \
> +   __typeof__(name[0])   \
> +   __attribute__((used,  \
> +  __aligned__(LINKTABLE_ALIGNMENT(name)),\
> +  section(SECTION_TBL(SECTION_TEXT, name, level

I'm really confused by this.  Text should obviously be readonly, but I'm
not at all clear how this works here.

The issue with linktables for text is kind of confusing if nothing else;
Russel is right about that.  It doesn't prevent us from doing something
similar, but perhaps it ought to have a different name.

For one thing, priority level is meaningless for text, since it is not a
table that can be indexed into.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 4/7] asm/sections: add a generic push_section_tbl()

2016-02-19 Thread H. Peter Anvin
On 02/19/2016 05:45 AM, Luis R. Rodriguez wrote:
> With a generic linker tables solution in place we
> need a general asm solution for declaring entries
> with asm. The first easy target is to cover the C
> asm declarations, guard the header file for now
> and define a first generic entry push_section_tbl()
> to be used later for custom linker table annotations.
> 
> Signed-off-by: Luis R. Rodriguez 
> ---
>  include/asm-generic/sections.h | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
> index af0254c09424..f5ea98bd85d2 100644
> --- a/include/asm-generic/sections.h
> +++ b/include/asm-generic/sections.h
> @@ -3,8 +3,10 @@
>  
>  /* References to section boundaries */
>  
> +#ifndef __ASSEMBLY__
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Usage guidelines:
> @@ -128,4 +130,12 @@ static inline bool init_section_intersects(void *virt, 
> size_t size)
>   return memory_intersects(__init_begin, __init_end, virt, size);
>  }
>  
> +/*
> + * Some architectures do not like the "\t" at the end (s39), we should be
> + * able to generalize this further, but so far this covers most 
> architectures.
> + */
> +#define push_section_tbl(section, name, level, flags)
> \
> + ".pushsection " SECTION_TBL(section,name,level) ",  \"" #flags "\"\n\t"
> +#endif
> +

I think the \n\t is unnecessary.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v2 0/7] linux: add linker tables

2016-02-19 Thread H. Peter Anvin
On 02/19/2016 05:45 AM, Luis R. Rodriguez wrote:
> This is my v2 of the original linker table work [0], now with
> six proof of concepts ports of existing code using custom section
> with custom linker script modifications:
> 
>   * DEFINE_LINKTABLE_TEXT(char, kprobes);
>   * DEFINE_LINKTABLE_DATA(struct jump_entry, __jump_table);
>   * DEFINE_LINKTABLE_DATA(struct _ddebug, __verbose);
>   * DEFINE_LINKTABLE_RO(struct builtin_fw, builtin_fw);
>   * DEFINE_LINKTABLE_INIT(struct x86_init_fn, x86_init_fns);
>   * DEFINE_LINKTABLE_INIT_DATA(unsigned long, _kprobe_blacklist);
> 
> I've tested all except jump tables, I'd appreaciate some help with that.
> 

We should add support for read-mostly, probably.  In fact, some of these
probably *are* read-mostly.

-hpa





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

2016-02-02 Thread H. Peter Anvin
On 01/22/2016 05:44 AM, Michael Matz wrote:
> Hi,
> 
> On Thu, 21 Jan 2016, H. Peter Anvin wrote:
> 
>> Something that confuses me is that gcc seems to give these sections the 
>> "aw" attributes which makes as complain.  This might be a gcc bug.
> 
> Workaround: use an (possibly empty) intializer:
> 
> struct foo {int i;};
> const struct foo
> __attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0] = {};
> 

Any forward progress on this?

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

2016-02-02 Thread H. Peter Anvin
On 02/02/2016 04:22 PM, Luis R. Rodriguez wrote:
>>
>> Should it be possible to resuse free_init_pages() and/or
>> free_reserved_area() only for routines (members in the array in this
>> case of a struct of fns) that don't meet our subarch once we're done
>> iterating over the routies and know we can discard things we know we
>> can drop? Through a cursory glance, *I think* its possible as-is, we
>> would just need easy access to the respective start and end addresses
>> and I guess there lies the challenge. Question is, is would that be
>> clean enough for us? Or are there other things you can think of that
>> perhaps might make this prospect cleaner later to add?
>>
>> I figure better ask now for architectural purposes than later after merged.
> 
> I don't think its needed we iron out in a solution *now* to be able to
> free code we know we won't need at run time but having a solid
> understanding adding this feature later without much impact to users
> might be worthy. As such I was pursuing a very basic proof of concept
> to ensure this is possible first given I didn't hear back if folks
> were sure this might be possible. I don't think a proof of concept
> should take long so just want to get fleshed out.
> 

This applies to the specific subarch use rather than generic linker
tables, right?

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 04/12] xen/hvmlite: Bootstrap HVMlite guest

2016-01-25 Thread H. Peter Anvin
On 01/25/16 13:12, Luis R. Rodriguez wrote:
>>
>> Perhaps, but someone would still have to set hardware_subarch. And
>> it's hvmlite_bootparams() that does it.
> 
> No, Xen would do it as well, essentially all of hvmlite_bootparams() could be
> done in Xen.
> 

Or a stub code.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 04/12] xen/hvmlite: Bootstrap HVMlite guest

2016-01-23 Thread H. Peter Anvin
On January 23, 2016 7:34:33 AM PST, Konrad Rzeszutek Wilk 
 wrote:
>
>>However, this stub belongs in Linux, not in the Xen toolstack.  That
>>way, when the Linux boot protocol is modified, both sides can be
>>updated
>>accordingly.
>
>I would add that this idea is borrowed from the EFI stub code that
>Linux has which also constructs the boot parameter structure when
>invoked (either from firmware or from EFI shell).

There is a huge difference though: EFI is a widely used multivendor industry 
standard.  You are taking about something Xen-specific, and which in good Xen 
tradition isn't even documented, apparently (did we ever get documentation for 
the hypervisor ABI?)

Asking "why burden Xen with something Linux-specific" is a pretty extreme case 
of the tail wagging the dog.

That being said, before any code can be put anywhere, it needs to be written.  
We can argue where to put it later.  We went through this process with the EFI 
stub, too: a standalone implementation (efilinux) first.

-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 04/12] xen/hvmlite: Bootstrap HVMlite guest

2016-01-23 Thread H. Peter Anvin
On January 23, 2016 8:12:23 AM PST, Konrad Rzeszutek Wilk 
<konrad.w...@oracle.com> wrote:
>On January 23, 2016 11:01:06 AM EST, "H. Peter Anvin" <h...@zytor.com>
>wrote:
>>On January 23, 2016 7:34:33 AM PST, Konrad Rzeszutek Wilk
>><konrad.w...@oracle.com> wrote:
>>>
>>>>However, this stub belongs in Linux, not in the Xen toolstack.  That
>>>>way, when the Linux boot protocol is modified, both sides can be
>>>>updated
>>>>accordingly.
>>>
>>>I would add that this idea is borrowed from the EFI stub code that
>>>Linux has which also constructs the boot parameter structure when
>>>invoked (either from firmware or from EFI shell).
>>
>>There is a huge difference though: EFI is a widely used multivendor
>>industry standard.  You are taking about something Xen-specific, and
>>which in good Xen tradition isn't even documented, apparently (did we
>>ever get documentation for the hypervisor ABI?)
>>
>>Asking "why burden Xen with something Linux-specific" is a pretty
>>extreme case of the tail wagging the dog.
>>
>>That being said, before any code can be put anywhere, it needs to be
>>written.  We can argue where to put it later.  We went through this
>>process with the EFI stub, too: a standalone implementation (efilinux)
>>first.
>
>http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg01793.html
>
>I believe is the latest version. Roger (CCed) has probably an updated
>one.

I suspect you should write a noninteractive bootloader as a reference 
implementation, and then consider porting Grub2 and maybe Syslinux to your ABI 
for those that want a full featured interactive bootloader compatible with the 
normal management.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

2016-01-22 Thread H. Peter Anvin
On 01/22/2016 05:44 AM, Michael Matz wrote:
> Hi,
> 
> On Thu, 21 Jan 2016, H. Peter Anvin wrote:
> 
>> Something that confuses me is that gcc seems to give these sections the 
>> "aw" attributes which makes as complain.  This might be a gcc bug.
> 
> Workaround: use an (possibly empty) intializer:
> 
> struct foo {int i;};
> const struct foo
> __attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0] = {};
> 

And indeed that works.  Awesome!  Much better than having to do an
assembly hack.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support

2016-01-21 Thread H. Peter Anvin
On 01/21/16 11:46, Luis R. Rodriguez wrote:
> On Thu, Jan 21, 2016 at 11:25 AM, H. Peter Anvin <h...@zytor.com> wrote:
>>> And that's exactly what HVMlite does. Most of this shim layer is setting
>>> up boot_params, after which we jump to standard x86 boot path (i.e.
>>> startup_{32|64}). With hardware_subarch set to zero.
>>
>> Which is the way to do it as long as the early code can be the same.
> 
> To be clear, with the subarchand linker table suggested in my patch
> series, it should be possible to have the same exact entry point, the
> Xen PV setup code could run early in the order. For instance in the
> linker table we could use the reserved order levels 01-09 for PV
> hypervisor code:
> 
> +/* Init order levels, we can start at 01 but reserve 01-09 for now */
> +#define X86_INIT_ORDER_EARLY   10
> +#define X86_INIT_ORDER_NORMAL  30
> +#define X86_INIT_ORDER_LATE50
> 
> So perhaps X86_INIT_ORDER_PV as 05 later.
> 
> The standard x86 init would just then be:
> 
> asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
> {
> x86_init_fn_init_tables();
> x86_init_fn_early_init();
> }
> 
> The PV init code would kick in early and could parse the
> boot_params.hdr.hardware_subarch_data pointer as it sees fit.
> 

Right... we already do that.

I don't even think you need to initialize any tables.  At least on i386,
we have to do this in assembly code.  However, it is just a simple table
walk.  :)

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

2016-01-21 Thread H. Peter Anvin
On 12/17/15 20:40, H. Peter Anvin wrote:
>>
>> const struct
>> foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0];
>>
>> const struct
>> foo__attribute__((used,section(".rodata.tbl.tablename.999")))
>> tablename__end[0];
>>

(Over)thinking about this some more, I suggest using the empty string
for the start and "~" for the end.  And, yes, I did check that ~ works
as part of a section name.

Something that confuses me is that gcc seems to give these sections the
"aw" attributes which makes as complain.  This might be a gcc bug.
Worst case we have to use an assembly statement to create these
sections; it isn't a big deal and shouldn't make it any more
architecture-specific.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

2016-01-21 Thread H. Peter Anvin
On 01/21/16 14:25, Luis R. Rodriguez wrote:
> On Thu, Jan 21, 2016 at 1:37 PM, Konrad Rzeszutek Wilk
>  wrote:
>>> Sure, do we know if that ICC compatible? Do we care? There are a
>>> series of ICC hacks put in place on ipxe's original solution which
>>> I've folded in, it seems that works but if we care about ICC those
>>> folks should perhaps help review as well.
>>
>> I didn't know the kernel could even be compiled with ICC? Thought
>> only GCC worked?
> 
> I'm happy with that, just wanted to make sure I raise the flag concern
> given the icc hacks on the linker tables.
> 
>> Anyhow - it may be that those fixes were for quite old ICC versions.
>> Does the latest one manifest these oddities?
> 
> I am not sure, I yield to Michael as the author of the original ICC
> compatibility pieces. If we don't care about ICC let me know and I'll
> just drop the stuff. In lack of such statements I'll just keep the
> work arounds in place, but I'm more than trilled to drop it.
> 

In general we let the ICC and Clang/LLVM teams communicate with out a
post facto.  We can't just guess what their requirements are, especially
since they are likely to change between revisions.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support

2016-01-21 Thread H. Peter Anvin
On 01/21/16 12:05, Luis R. Rodriguez wrote:
> 
>> At least on i386,
>> we have to do this in assembly code.
> 
> Neat! How is that order kept?
> 

Right now subarch_entries[] is just an array indexed by subarch number
hardcoded in head_32.S.

However, if you have a list of (id, target) then you could just iterate
over it.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support

2016-01-21 Thread H. Peter Anvin
On 01/21/16 16:25, Luis R. Rodriguez wrote:
>>
>> Basically, if the hardware is enumerable using standard PC mechanisms (PCI, 
>> ACPI) and doesn't need a special boot flow it should use type 0.
> 
> I can extend the documentation as part of this to be clear.
> 
> I will note though that this still leaves a gap on code which might
> want to access the question "are we in a virt environment, and if so
> on which one" in between really early boot and right before
> init_hypervisor_platform(). Or rather, given subarch can be used by
> Xen and lguest but not KVM it means KVM doesn't get to use it. It may
> not need it, but its also rather trivial to set up on qemu, and I have
> a patch for that if we wanted one for KVM. That would extend the
> definition of subarch a bit more, but as it stands today its use was
> rather limited. Heck, subharch_data is to this day unused.
> 

KVM is not a subarch, and Xen HVM isn't either; the subarch was meant to
be specifically to handle nonstandard boot entries; the CE4100 extension
was itself kind of a hack.

If you have a genuine need for a "hypervisor type" then that is a
separate thing and should be treated separately from subarch.  However,
you need to consider that some hypervisors can emulate other hypervisors
and you may have more than one hypervisor API available.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support

2016-01-21 Thread H. Peter Anvin
On 01/21/16 05:45, Boris Ostrovsky wrote:
>> I don't think the hypervisor should be setting Linux specific boot
>> related parameters, the boot ABI should be OS agnostic. IMHO, a small
>> shim should be added to Linux in order to set what Linux requires when
>> entering from a Xen entry point.

For the record, this is exactly what hardware_subarch is meant to do:
revector to a stub to do this kind of setup, for a subarchitecture which
doesn't have a natural stub like BIOS or EFI.  In the case of Xen PV, or
lguest, there are special care that has to be done very early in the
path due to the nonstandard handling of page tables, which is another
reason for this field.

> And that's exactly what HVMlite does. Most of this shim layer is setting
> up boot_params, after which we jump to standard x86 boot path (i.e.
> startup_{32|64}). With hardware_subarch set to zero.

Which is the way to do it as long as the early code can be the same.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support

2016-01-21 Thread H. Peter Anvin
On 01/21/16 11:50, H. Peter Anvin wrote:
> 
> Right... we already do that.
> 
> I don't even think you need to initialize any tables.  At least on i386,
> we have to do this in assembly code.  However, it is just a simple table
> walk.  :)
> 

It might make more sense to make subarch its own table, though, although
I haven't looked at your code in enough detail to say.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 2/8] tables.h: add linker table support

2016-01-20 Thread H. Peter Anvin
On January 20, 2016 3:15:48 PM PST, Michael Brown  wrote:
>>> + * To solve this problem linker tables can be used on Linux, it
>enables you to
>>> + * always force compiling of select features that one wishes to
>avoid bit-rot
>>> + * while still enabling you to disable linking feature code into
>the final
>>> + * kernel image or building certain modules if the features have
>been disabled
>>> + * via Kconfig. The code is derivative of gPXE linker table's
>solution.
>
>I missed the start of this thread.  However, asking as the author of
>the 
>original Etherboot/gPXE/iPXE linker table solution: please change all 
>references from "gPXE" to "iPXE".
>
>Thanks,
>
>Michael

Yes, that request has already been made :)
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support

2016-01-20 Thread H. Peter Anvin
On 01/20/16 13:33, Luis R. Rodriguez wrote:
> 
> That's correct for PV and PVH, likewise when qemu is required for HVM
> qemu could set it. I have the qemu change done but that should only
> cover HVM. A common place to set this as well could be the hypervisor,
> but currently the hypervisor doesn't set any boot_params, instead a
> generic struct is passed and the kernel code (for any OS) is expected
> to interpret this and then set the required values for the OS in the
> init path. Long term though if we wanted to merge init further one way
> could be to have the hypervisor just set the zero page cleanly for the
> different modes. If we needed more data other than the
> hardware_subarch we also have the hardware_subarch_data, that's a u64
> , and how that is used would be up to the subarch. In Xen's case it
> could do what it wants with it. That would still mean perhaps defining
> as part of a Xen boot protocol a place where xen specific code can
> count on finding more Xen data passed by the hypervisor, the
> xen_start_info. That is, if we wanted to merge init paths this is
> something to consider.
> 
> One thing I considered on the question of who should set the zero page
> for Xen with the prospect of merging inits, or at least this subarch
> for both short term and long term are the obvious implications in
> terms of hypervisor / kernel / qemu combination requirements if the
> subarch is needed. Having it set in the kernel is an obvious immediate
> choice for PV / PVH but it means we can't merge init paths completely
> (down to asm inits), we'd still be able to merge some C init paths
> though, the first entry would still be different. Having the zero page
> set on the hypervisor would go long ways but it would mean a
> hypervisor change required.
> 
> These prospects are worth discussing, specially in light of Boris's
> hvmlite work.
> 

The above doesn't make sense to me.  hardware_subarch is really used
when the boot sequence is somehow nonstandard.  HVM probably doesn't
need that.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 4/8] x86/init: add linker table support

2016-01-20 Thread H. Peter Anvin
On January 20, 2016 2:12:49 PM PST, "Luis R. Rodriguez" 
<mcg...@do-not-panic.com> wrote:
>On Wed, Jan 20, 2016 at 1:41 PM, H. Peter Anvin <h...@zytor.com> wrote:
>> On 01/20/16 13:33, Luis R. Rodriguez wrote:
>>>
>>> That's correct for PV and PVH, likewise when qemu is required for
>HVM
>>> qemu could set it. I have the qemu change done but that should only
>>> cover HVM. A common place to set this as well could be the
>hypervisor,
>>> but currently the hypervisor doesn't set any boot_params, instead a
>>> generic struct is passed and the kernel code (for any OS) is
>expected
>>> to interpret this and then set the required values for the OS in the
>>> init path. Long term though if we wanted to merge init further one
>way
>>> could be to have the hypervisor just set the zero page cleanly for
>the
>>> different modes. If we needed more data other than the
>>> hardware_subarch we also have the hardware_subarch_data, that's a
>u64
>>> , and how that is used would be up to the subarch. In Xen's case it
>>> could do what it wants with it. That would still mean perhaps
>defining
>>> as part of a Xen boot protocol a place where xen specific code can
>>> count on finding more Xen data passed by the hypervisor, the
>>> xen_start_info. That is, if we wanted to merge init paths this is
>>> something to consider.
>>>
>>> One thing I considered on the question of who should set the zero
>page
>>> for Xen with the prospect of merging inits, or at least this subarch
>>> for both short term and long term are the obvious implications in
>>> terms of hypervisor / kernel / qemu combination requirements if the
>>> subarch is needed. Having it set in the kernel is an obvious
>immediate
>>> choice for PV / PVH but it means we can't merge init paths
>completely
>>> (down to asm inits), we'd still be able to merge some C init paths
>>> though, the first entry would still be different. Having the zero
>page
>>> set on the hypervisor would go long ways but it would mean a
>>> hypervisor change required.
>>>
>>> These prospects are worth discussing, specially in light of Boris's
>>> hvmlite work.
>>>
>>
>> The above doesn't make sense to me.  hardware_subarch is really used
>> when the boot sequence is somehow nonstandard.
>
>Thanks for the feedback -- as it stands today hardware_subarch is only
>used by lguest, Moorestown, and CE4100 even though we had definitions
>for it for Xen -- this is not used yet. Its documentation does make
>references to differences for a paravirtualized environment, and uses
>a few examples but doesn't go into great depths about restrictions so
>its limitations in how we could use it were not clear to me.
>
>>  HVM probably doesn't need that.
>
>Today HVM doesn't need it, but perhaps that is because it has not
>needed changes early on boot. Will it, or could it? I'd even invite us
>to consider the same for other hypervisors or PV hypervisors. I'll
>note that things like cpu_has_hypervisor() or derivatives
>(kvm_para_available() which is now used on drivers even, see
>sound/pci/intel8x0.c) requires init_hypervisor_platform() run, in
>terms of the x86 init sequence this is run pretty late at
>setup_arch(). Should code need to know hypervisor info anytime before
>that they have no generic option available.
>
>I'm fine if we want to restrict hardware_subarch but I'll note the
>semantics were not that explicit to delineate clear differences and I
>just wanted to highlight the current early boot restriction of
>cpu_has_hypervisor().
>
>  Luis

Basically, if the hardware is enumerable using standard PC mechanisms (PCI, 
ACPI) and doesn't need a special boot flow it should use type 0.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Unifying x86_64 / Xen init paths and reading hardware_subarch early

2016-01-15 Thread H. Peter Anvin
On January 15, 2016 4:43:04 PM PST, "Luis R. Rodriguez"  wrote:
>On Fri, Jan 15, 2016 at 03:47:25PM -0800, Andy Lutomirski wrote:
>> On Fri, Jan 15, 2016 at 2:08 PM, Luis R. Rodriguez 
>wrote:
>> > I will be respinning the generic Linux linker table solution [0]
>soon
>> > based on hpa's feedback again now that I'm back from vacation. As I
>do
>> > that though I wanted to highlight a feature I'm throwing into the
>> > linker table solution which I am not sure many have paid close
>> > attention to but I think is important to Xen. I'm making use of the
>> > zero page hardware_subarch to enable us to detect if we're a
>specific
>> > hypervisor solution *as early as is possible*. This has a few
>> > implications, short term it is designed to provides a proactive
>> > technical solution to bugs such as the cr4 shadow crash (see
>> > 5054daa285beaf706f051fbd395dc36c9f0f907f) and ensure that *new* x86
>> > features get a proper Xen implementation proactively *or* at the
>very
>> > least get annotated as unsupported properly, instead of having them
>> > crash and later finding out. A valid example here is Kasan, which
>to
>> > this day lacks proper Xen support. In the future, if the generic
>> > linker table solution gets merged, it would mean developers would
>have
>> > to *think* about if they support Xen or not at development time. It
>> > does this in a not-disruptive way to Xen / x86_64 but most
>> > *importantly* it does not extend pvops! This should avoid issues in
>> > cases of developer / maintainer bandwidth, should some new features
>be
>> > pushed onto Linux for x86_64 but a respective Xen solution is not
>> > addressed, and that was not caught early in patch review, such as
>with
>> > Kasan.
>> >
>> > [0]
>https://lkml.kernel.org/r/1450217797-19295-1-git-send-email-mcg...@do-not-panic.com
>> >
>> > Two things I'd like to request a bit of help with and review /
>consideration:
>> >
>> > 1) I'd like some advice on a curious problem I've stumbled on. I'd
>> > like to access hardware_subarch super early, and in my review with
>at
>> > least two x86 folks this *should* work:
>> >
>> > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>> > index c913b7eb5056..9168842821c8 100644
>> > --- a/arch/x86/kernel/head64.c
>> > +++ b/arch/x86/kernel/head64.c
>> > @@ -141,6 +141,7 @@ static void __init copy_bootdata(char
>*real_mode_data)
>> >
>> >  asmlinkage __visible void __init x86_64_start_kernel(char *
>real_mode_data)
>> >  {
>> > + struct boot_params *params = (struct boot_params
>*)__va(real_mode_data);
>> >   int i;
>> 
>> This is a mess :-p
>
>Agreed. Doing what I can without extending pvops though ;)
>
>> If you want to access real_mode_data before load_idt, you'll need to
>do:
>> 
>> for (i = 0; i < sizeof(boot_params); i += 4096)
>> early_make_pgtable((unsigned long)params + i);
>
>Thanks I'll give this a shot.
>
>> Of course, it's entirely possible that that will blow up if you try
>to
>> do it on Xen.
>
>I'll check, if its safe and if the subarch strategy is desirable to
>help with
>unifying init, then great. Otherwise we'd need to figure this out.
>
>> I think this would all be easier to understand if you try to separate
>> out the ideas of linker tables from the idea of rearranging early
>> init.
>
>Oh absolutely. The goal to unify init *or* to access subarch earlier
>provides
>slightly different gains and possiblities. This is why I am addressing
>this
>separately. Its important to highlight the prospects though given I
>think a few
>folks may not have realized what might be possible here...
>
>>  AFAICT the linker table thing is just an implementation detail.
>
>Indeed, but just as a linker table is one thing, the *use* of the
>linker table
>for x86 early init is another. Its a good example how how to use the
>linker
>tables though.
>
>The things I make mention of here are just possible *enhancements* of
>that work
>provided the subarch can be read earlier.  Another possibility which I
>also had
>not mentioned is the ability to also free annotated code on x86 init
>which we
>*know* for sure we don't need, much as __init code after we boot, only
>this
>could be done later at run time. That's also best technically
>considered later
>but perhaps worth mentioning now as a future possibility.
>
>Although the linker table series does not address unifying init, in
>this thread
>we are talking about the prospect of being able to do that in the
>future. Its
>best to consider this early than late.
>
>> If I understand right, you're trying to unify the Xen and native
>> startup as much as possible. 
>
>That ultimately is a possibility. The original patches don't do that
>though.
>They just pave the way with linker tables as baby steps.
>
>Without access to the subarch so early unifying init is not possible
>with a
>linker table solution though. As the series was posted though its late
>use
>(after load_idt()) still holds promise to help annotate 

Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

2015-12-17 Thread H. Peter Anvin
I think we can make this even more generic.  In particular, I would love
to see a solution for link tables that:

a) can be used for any kind of data structures, not just function
pointers (the latter is a specialization of the former);
b) doesn't need any changes to the linker scripts once the initial
enabling is done for any one architecture.

Key to this is to be able to define tables by name only, which is really
why SORT_BY_NAME() is used: the name sorts before the priority simply by
putting the name before the class.

Instead of .tbl.* naming of sections I think we should have the first
component be the type of section (.rodata, .data, .init_rodata,
.read_mostly etc.) which makes it easier to write a linker script that
properly sorts it into the right section.  The other thing is to take a
clue from the implementation in iPXE, which uses priority levels 00 and
99 (or we could use non-integers which sort appropriately instead of
using "real" levels) to contain the start and end symbols, which
eliminates any need for linker script modifications to add new tables.

Making this a generic facility we could eventually eliminate a bunch of
ad hoc hacks we currently have.

Oh, and the link table feature should NOT be x86-specific.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 5/8] x86/init: move ebda reservations into linker table

2015-12-17 Thread H. Peter Anvin
On 12/17/15 12:48, Andy Lutomirski wrote:
> 
> I'm entirely ignorant of anything going on in gPXE/iPXE.
> 
> Can you explain what a linker table *does*?  It looks like all you've
> done in this patch is to move code around.  What actually happens?
> 

A linker table is a data structure that is stitched together from items
in multiple object files.

We already have a *bunch* of linker tables in Linux, mostly the init
tables, but they are all built in an ad hoc manner which requires linker
script modifications, which are of course per architecture.

My desire would be to make a general linker table facility so that a new
linker table can be implemented by changing C code only.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

2015-12-17 Thread H. Peter Anvin
On 12/17/15 15:46, Luis R. Rodriguez wrote:
> 
> I explain why I do that there but the gist of it is that on Linux we may also
> want stronger semantics for specific linker table solutions, and solutions 
> such
> as those devised on the IOMMU init stuff do memmove() for sorting depending on
> semantics defined (in the simplest case here so far dependency between init
> sequences), this makes each set of sequences very subsystem specific. An issue
> with *one* subsystem could make things really bad for others. I thought about
> this quite a bit and figured its best left to the subsystem maintainers to
> decide.
> 

A table that needs sorting or other runtime handling is just a
read-write table for the purpose of the linker table construct.  It
presents to C as an array of initialized data.

> Perhaps a new sections.h file (you tell me) which documents the different
> section components:
> 
> /* document this *really* well */
> #define SECTION_RODATA".rodata"
> #define SECTION_INIT  ".init"
> #define SECTION_INIT_RODATA   ".init_rodata"
> #define SECTION_READ_MOSTLY   ".read_mostly"
> 
> Then on tables.h we add the section components support:

Yes, something like that.  How to macroize it cleanly is another matter;
we may want to use slightly different conventions that iPXE to match our
own codebase.

> #define __table(component, type, name) (component, type, name) 
> 
> #define __table_component(table) __table_extract_component table  
> 
> #define __table_extract_component(component, type, name) component
> 
> #define __table_type(table) __table_extract_type table
>   
> #define __table_extract_type(component, type, name) type
> 
> #define __table_name(table) __table_extract_name table
>   
> #define __table_extract_name(component, type, name) name 
> 
> #define __table_str(x) #x 
> 
> #define __table_section(table, idx) \ 
>   
> "." __table_component (table) ".tbl." __table_name (table) "." 
> __table_str (idx)  
> 
> #define __table_entry(table, idx)   \ 
>   
> __attribute__ ((__section__(__table_section(table, idx)),   \ 
>   
> __aligned__(__table_alignment(table
> 
> A user could then be something as follows:
> 
> #define X86_INIT_FNS __table(SECTION_INIT, struct x86_init_fn, 
> "x86_init_fns") 
> #define __x86_init_fn(order_level) __table_entry(X86_INIT_FNS, order_level)

Yes, but in particular the common case of function initialization tables
should be generic.

I'm kind of thinking a syntax like this:

DECLARE_LINKTABLE_RO(struct foo, tablename);
DEFINE_LINKTABLE_RO(struct foo, tablename);
LINKTABLE_RO(tablename,level) = /* contents */;
LINKTABLE_SIZE(tablename)

... which would turn into something like this once it goes through all
the preprocessing phases

/* DECLARE_LINKTABLE_RO */
extern const struct foo tablename[], tablename__end[];

/* DEFINE_LINKTABLE_RO */
DECLARE_LINKTABLE_RO(struct foo, tablename);

const struct
foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0];

const struct
foo__attribute__((used,section(".rodata.tbl.tablename.999")))
tablename__end[0];

/* LINKTABLE_RO */
static const __typeof__(tablename)
__attribute__((used,section(".rodata.tbl.tablename.50")))
__tbl_tablename_12345

/* LINKTABLE_SIZE */
((tablename__end) - (tablename))

... and so on for all the possible sections where we may want tables.

Note: I used 0 and 999 above since they sort before and after all
possible 2-digit decimal numbers, but that's just cosmetic.

> If that's what you mean?
> 
> I'm a bit wary about having the linker sort any of the above SECTION_*'s, but
> if we're happy to do that perhaps a simple first step might be to see if 0-day
> but would be happy with just the sort without any consequences to any
> architecture. Thoughts?

I don't see what is dangerous about it.  The section names are such that
a lexographical sort will do the right thing, and we can simply use
SORT(.rodata.tbl.*) in the linker script, for example.

>> The other thing is to take a
>> clue from the implementation in iPXE, which uses priority levels 00 and
>> 99 (or we could use non-integers which sort appropriately instead of
>> using "real" levels) to contain the start and end symbols, which
>> eliminates any need for linker script modifications to add new tables.
> 
> This solution uses that as well. The only need for adding custom sections
> is when they have a requirement for a custom run time sort, and also to
> ensure they don't cause regressions on other subsystems if they have a buggy
> sort. The run time sorting is all subsystem specific and up to their own
> semantics.

Again, from a linker table POV this is nothing other than a read-write
table; there is a runtime function that then operates on that read-write
table.

-hpa




Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

2015-12-17 Thread H. Peter Anvin
On 12/17/15 20:25, H. Peter Anvin wrote:
> 
> /* DECLARE_LINKTABLE_RO */
> extern const struct foo tablename[], tablename__end[];
> 
> /* DEFINE_LINKTABLE_RO */
> DECLARE_LINKTABLE_RO(struct foo, tablename);
> 
> const struct
> foo__attribute__((used,section(".rodata.tbl.tablename.0"))) tablename[0];
> 
> const struct
> foo__attribute__((used,section(".rodata.tbl.tablename.999")))
> tablename__end[0];
> 
> /* LINKTABLE_RO */
> static const __typeof__(tablename)
> __attribute__((used,section(".rodata.tbl.tablename.50")))
> __tbl_tablename_12345
> 
> /* LINKTABLE_SIZE */
> ((tablename__end) - (tablename))
> 
> ... and so on for all the possible sections where we may want tables.
> 

Come to think of it, we could even eliminate the need for a DEFINE
entirely if we made the start and end symbols static.  However, this
would generate an awful lot of identical-but-local symbols which
probably would make the linker slower and definitely would bloat the
debug data.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 0/8] x86/init: Linux linker tables

2015-12-15 Thread H. Peter Anvin
On December 15, 2015 2:16:29 PM PST, "Luis R. Rodriguez" 
 wrote:
>From: "Luis R. Rodriguez" 
>
> A long time ago in a galaxy far,
> far away...
>
>Konrad Rzeszutek Wilk posted patches which eventually got merged to
>help
>with modularizing the IOMMUs we have on x86 [0]. This work was done due
>to
>the complex relationship that exists on IOMMUs and the requirements on
>careful execution. The solution also provided a mechanism which
>jettisoned
>unused IOMMUs during run-time.
>
>During review, even though the code was merged, hpa did note that we
>tend
>to encounter this type of problem "often enough that we should
>implement a
>generic facility for it" [1], hpa acknowledged that it obviously has to
>be
>based on sections and even noted that perhaps we might be able in the
>future to
>automate its creation. He noted that the gPXE folks had done just this
>with
>linker tables and suggested that "presumably we'd need a few different
>flavors
>for init tables and so on, but this would make it a generic mechanism."
>
>The IOMMU code got merged and this was left on someone's mental
>backburner.
>I've had an itch to scratch recently to try to avoid issues which are
>possible
>if one does not jettison other code carefully due to the large
>complexity of
>implicit dependencies of certain code on x86 in particular with
>possible dead
>code on x86 due to paravirtualization, and the IOMMU jettison strategy
>turned
>out to be my favorite solution so far. I've taken on hpa's suggestions
>from
>back in the day to review gPXE's solution to see if we could embrace it
>on
>Linux for a generic section solution to help jettison code carefully.
>
>What this patch set does exactly:
>
>This RFC patch set attempts to add support for such a generic solution.
>In the end, it turns out that the best solution possible was the best
>of
>both worlds: a combination of what Konrad had implemented in addition
>to
>what Michael Brown had implemented on the gXPE front. The IOMMU
>solution
>enables simple semantic annotations for dependency relationships, this
>however requires a run time sort. The gPXE solution grants the option
>to
>simply sort at build time. One of gPXE's solution primary goals however
>was
>also to help avoid bit-rot on code that's possible from #ifdef'ery. The
>Linux
>linker table solution enables developers to pick and choose what they
>need, with linker tables being the simplest solution. Contrary to gPXE
>which strives to force compilation of all linker table solutions we
>let developers pick *when* they want this as part of their solution.
>As can be seen from the suggested x86 init specific use of linker
>tables
>proposed you can also take advantage of both, linker sorting, optional
>compilation when needed (at developer's discretion), and even careful
>semantics annotation for dependency / relationship annotations.
>Although
>the x86 init solution here is heavily inspired by the IOMMU solution it
>diverges with strong semantics, and a new optional subarchitecture
>annotation. Sorting of init sequences is structure specific, as such
>each subsystem must defing their own solution unless semantics could
>be shared. I considered sharing semantics but in the end this proved
>pointless so this keeps things separate. A series of changes were made
>to the x86 init sequence in contrast to the IOMMU solution to be
>*extremely
>pedantic* on semantics, review of this changes can be studied on the
>table-init tree [2].
>
>Quick review of gPXE's solution and prospects on further changes:
>
>In my review from gPXE's solution it was not clear what hpa meant by
>gXPE folks having automated this process, they actually use linker
>tables
>all around, forcing compilation of *everything* and just do linking of
>enabled features at link time. You still need to build linker tables on
>your own. What I do see more potential for in the future is enabling to
>evolve stronger semantics over time, and this would also be subsystem
>specific.
>This will be evident in this patch set on the x86 init use of linker
>tables.
>I also see potential in strenghtening semantics for linker sorting, any
>of
>these types of features however would impose requiring newer binutils.
>For
>instance, gPXE's linker solution currently relies on SORT(), that
>defaults to
>SORT_BY_NAME(). This sorts lexicographically, gPXE's solution uses two
>digits
>to enable SORT_BY_NAME()'s lexicographical sort to sort orde by numeric
>priority. Since one is in control of order-level numbers one can
>provide
>guarantee that this sort should work as intended, however binutils also
>now has
>a SORT_BY_INIT_PRIORITY() which sorts specifically based on digits.
>SORT_BY_INIT_PRIORITY() was designed specifically for init_array
>sections
>though. Refer to the userspace mockup solution table-init git tree [2]
>commit
>6deba47ee1ad461e90 for more details on this.  One thing I can envision
>to help
>here further are 

Re: [Xen-devel] [PATCH] x86/mm: Skip the hypervisor range when walking PGD

2015-11-05 Thread H. Peter Anvin
On 11/05/15 10:56, Boris Ostrovsky wrote:
> The range between 0x8000 and 0x87ff is reserved
> for hypervisor and therefore we should not try to follow PGD's indexes
> corresponding to those addresses.
> 
> While this has alsways been a problem, with commit e1a58320a38d ("x86/mm:
> Warn on W^X mappings") ptdump_walk_pgd_level_core() can now be called
> during boot, causing a PV Xen guest to crash.
> 
> Reported-by: Sander Eikelenboom 
> Signed-off-by: Boris Ostrovsky 
> ---
>  arch/x86/mm/dump_pagetables.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index 1bf417e..756c921 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -362,8 +362,13 @@ static void ptdump_walk_pgd_level_core(struct seq_file 
> *m, pgd_t *pgd,
>  bool checkwx)
>  {
>  #ifdef CONFIG_X86_64
> +/* 8000 - 87ff is reserved for hypervisor */
> +#define is_hypervisor_range(idx) (paravirt_enabled() && \
> +   (((idx) >= pgd_index(__PAGE_OFFSET) - 16) && \
> +((idx) < pgd_index(__PAGE_OFFSET
>   pgd_t *start = (pgd_t *) _level4_pgt;
>  #else
> +#define is_hypervisor_range(idx)   0
>   pgd_t *start = swapper_pg_dir;
>  #endif
>   pgprotval_t prot;
> @@ -381,7 +386,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file 
> *m, pgd_t *pgd,
>  
>   for (i = 0; i < PTRS_PER_PGD; i++) {
>   st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
> - if (!pgd_none(*start)) {
> + if (!pgd_none(*start) && !is_hypervisor_range(i)) {
>   if (pgd_large(*start) || !pgd_present(*start)) {
>   prot = pgd_flags(*start);
>   note_page(m, , __pgprot(prot), 1);
> 

Maybe we could use the max_lines field in the address_markers[] array?
We really shouldn't be mapping anything in the hypervisor space even on
native.

-hpa


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-30 Thread H. Peter Anvin
On 09/21/2015 09:36 AM, Linus Torvalds wrote:
> 
> How many msr reads are so critical that the function call
> overhead would matter? Get rid of the inline version of the _safe()
> thing too, and put that thing there too.
> 

Probably only the ones that may go in the context switch path.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread H. Peter Anvin
However, the difference between one CONFIG and another is quite frankly crazy.  
We should explicitly use the safe versions where this is appropriate, and then 
yes, we should do this.

Yet another reason the paravirt code is batshit crazy.

On September 17, 2015 2:31:34 AM PDT, Borislav Petkov  wrote:
>On Thu, Sep 17, 2015 at 09:19:20AM +0200, Ingo Molnar wrote:
>> Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I
>checked Ubuntu and 
>> Fedora), so we are potentially exposing a lot of users to problems.
>
>+ SUSE.
>
>> Crashing the bootup on an unknown MSR is bad. Many MSR reads and
>writes are 
>> non-critical and returning the 'safe' result is much better than
>crashing or 
>> hanging the bootup.
>
>... and prepending all MSR accesses with feature/CPUID checks is
>probably almost
>impossible.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 2/4] x86/ldt: Make modify_ldt synchronous

2015-08-13 Thread H. Peter Anvin
On 07/27/2015 10:29 PM, Andy Lutomirski wrote:
 modify_ldt has questionable locking and does not synchronize
 threads.  Improve it: redesign the locking and synchronize all
 threads' LDTs using an IPI on all modifications.
 
 This will dramatically slow down modify_ldt in multithreaded
 programs, but there shouldn't be any multithreaded programs that
 care about modify_ldt's performance in the first place.
 

nitpick

... except 32-bit programs compiled with one specific version of glibc.
 Do we care?  I don't think so.

/nitpick



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/8] Use correctly the Xen memory terminologies in Linux

2015-07-28 Thread H. Peter Anvin
On 07/28/2015 08:02 AM, Julien Grall wrote:
 Hi all,
 
 This patch series aims to use the memory terminologies described in
 include/linux/mm.h [1] for Linux xen code.
 
 Linux is using mistakenly MFN when GFN is meant, I suspect this is because the
 first support of Xen was for PV. This has brought some misimplementation
 of memory helpers on ARM and make the developper confused about the expected
 behavior.
 
 For instance, with pfn_to_mfn, we expect to get a MFN based on the name.
 Although, if we look at the implementation on x86, it's returning a GFN.
 Most of the callers are also using it this way.
 
 The first 2 patches of this series is ARM related in order to remove
 PV specific helpers which should not be used and fixing the implementation of
 pfn_to_mfn.
 
 The rest of the series is here rename most of the usage in the common code
 of MFN to GFN. I also took the opportunity to replace most of the call to
 pfn_to_gfn in the common code by page_to_gfn avoid construction such
 as pfn_to_gfn(page_to_pfn(...).
 
 Note the one xen-blkfront will be dropped by 64K series [2], I can include it
 if necessary.
 

Can we actually get some documentation for Xen before starting to change
names around?

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/cpu: Fix SMAP check in PVOPS environments

2015-06-04 Thread H. Peter Anvin
On 06/04/2015 12:55 PM, Rusty Russell wrote:
 
 Yeah, hard cases make bad law.
 
 I'm not too unhappy with this fix; ideally we'd rename save_fl and
 restore_fl to save_eflags_if and restore_eflags_if too.
 

I would be fine with this... but please document what the bloody
semantics of pvops is actually supposed to be.

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/cpu: Fix SMAP check in PVOPS environments

2015-06-04 Thread H. Peter Anvin
On 06/03/2015 02:31 AM, Andrew Cooper wrote:
 There appears to be no formal statement of what pv_irq_ops.save_fl() is
 supposed to return precisely.  Native returns the full flags, while lguest and
 Xen only return the Interrupt Flag, and both have comments by the
 implementations stating that only the Interrupt Flag is looked at.  This may
 have been true when initially implemented, but no longer is.
 
 To make matters worse, the Xen PVOP leaves the upper bits undefined, making
 the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
 guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
 build, but not consistent for all builds.  It has also been a sitting timebomb
 since SMAP support was introduced.
 
 Use native_save_fl() instead, which will obtain an accurate view of the AC
 flag.

Could we fix the Xen pvops wrapper instead to not do things like this?

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv1] x86: don't schedule when handling #NM exception

2015-03-05 Thread H. Peter Anvin
On 06/23/2014 06:08 AM, Konrad Rzeszutek Wilk wrote:
 On Wed, Mar 19, 2014 at 08:02:22AM -0700, H. Peter Anvin wrote:
 On 03/19/2014 06:21 AM, Konrad Rzeszutek Wilk wrote:

 The following patch does the always eager allocation.  It's a fixup of
 Suresh's original patch.


 Hey Peter,

 I think this is the solution you were looking for?

 Or are there some other subtle issues that you think lurk around?


 Ah, I managed to miss it (mostly because it was buried *inside* another
 email and didn't change the subject line... I really dislike that mode
 of delivering a patch.
 
 Let me roll up some of these patchset and send them as git send-email.
 

 Let me see if the issues have been fixed.  Still wondering if there is a
 way we can get away without the boot_func hack...
 
 I have to confesss I don't even remember what the 'if the issues have been
 fixed' is referring to?
 

Hi Konrad... it looks like this got left waiting for you and got forgotten?

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/2] x86/xen: allow privcmd hypercalls to be preempted

2014-12-10 Thread H. Peter Anvin
On 12/10/2014 03:34 PM, Luis R. Rodriguez wrote:
 diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
 index 344b63f..40b5c0c 100644
 --- a/arch/x86/kernel/entry_32.S
 +++ b/arch/x86/kernel/entry_32.S
 @@ -982,7 +982,28 @@ ENTRY(xen_hypervisor_callback)
  ENTRY(xen_do_upcall)
  1:   mov %esp, %eax
   call xen_evtchn_do_upcall
 +#ifdef CONFIG_PREEMPT
   jmp  ret_from_intr
 +#else
 + GET_THREAD_INFO(%ebp)
 +#ifdef CONFIG_VM86
 + movl PT_EFLAGS(%esp), %eax  # mix EFLAGS and CS
 + movb PT_CS(%esp), %al
 + andl $(X86_EFLAGS_VM | SEGMENT_RPL_MASK), %eax
 +#else
 + movl PT_CS(%esp), %eax
 + andl $SEGMENT_RPL_MASK, %eax
 +#endif
 + cmpl $USER_RPL, %eax
 + jae resume_userspace# returning to v8086 or userspace
 + DISABLE_INTERRUPTS(CLBR_ANY)
 + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall)
 + jz resume_kernel
 + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall)
 + call cond_resched_irq
 + movb $1,PER_CPU_VAR(xen_in_preemptible_hcall)
 + jmp resume_kernel
 +#endif /* CONFIG_PREEMPT */
   CFI_ENDPROC
  ENDPROC(xen_hypervisor_callback)
  
 diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
 index c0226ab..0ccdd06 100644
 --- a/arch/x86/kernel/entry_64.S
 +++ b/arch/x86/kernel/entry_64.S
 @@ -1170,7 +1170,23 @@ ENTRY(xen_do_hypervisor_callback)   # 
 do_hypervisor_callback(struct *pt_regs)
   popq %rsp
   CFI_DEF_CFA_REGISTER rsp
   decl PER_CPU_VAR(irq_count)
 +#ifdef CONFIG_PREEMPT
   jmp  error_exit
 +#else
 + movl %ebx, %eax
 + RESTORE_REST
 + DISABLE_INTERRUPTS(CLBR_NONE)
 + TRACE_IRQS_OFF
 + GET_THREAD_INFO(%rcx)
 + testl %eax, %eax
 + je error_exit_user
 + cmpb $0,PER_CPU_VAR(xen_in_preemptible_hcall)
 + jz retint_kernel
 + movb $0,PER_CPU_VAR(xen_in_preemptible_hcall)
 + call cond_resched_irq
 + movb $1,PER_CPU_VAR(xen_in_preemptible_hcall)
 + jmp retint_kernel
 +#endif /* CONFIG_PREEMPT */
   CFI_ENDPROC
  END(xen_do_hypervisor_callback)
  
 @@ -1398,6 +1414,7 @@ ENTRY(error_exit)
   GET_THREAD_INFO(%rcx)
   testl %eax,%eax
   jne retint_kernel
 +error_exit_user:
   LOCKDEP_SYS_EXIT_IRQ
   movl TI_flags(%rcx),%edx
   movl $_TIF_WORK_MASK,%edi

You're adding a bunch of code for the *non*-preemptive case here... why?

-hpa



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel