Re: [patch] paravirt: VDSO page is essential

2007-03-06 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> Yes, I don't have a problem with your patch, I just wish I had been
> cc'd on it.  Fixing this is rather tricky, but I believe no strange
> build magic is required, it can be done in kernel init code.  Still
> building my SUSE 9.0 guest to test.  SUSE 9.0 is one of those that
> requires COMPAT_VDSO, yes?

Where do we stand on VDSO reloc?  Are you turning this into a proper
patch, or shall I have a look at it?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-06 Thread Jeremy Fitzhardinge
Roland McGrath wrote:
> For everything else to work, it needs to be set by changing __FIXADDR_TOP,
> which seems to be done by calling reserve_top_address early enough.
> It looks like that needs to be properly tied into paravirt_ops somehow.
>   

The startup code for whatever hypervisor you're running makes a call to
reserve_top to reserve the appropriate amount of address space.  This
happens very early.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-06 Thread Roland McGrath
> -# define VDSO_PRELINK  VDSO_HIGH_BASE
> +# ifndef CONFIG_XEN
> +#  define VDSO_PRELINK VDSO_HIGH_BASE
> +# else
> +#  define VDSO_PRELINK (0UL - FIX_VDSO * PAGE_SIZE)
> +# endif
> 
> should be Kconfig driven, not #ifdef driven, due to cleanliness and also 
> because lguest wants to have the same thing. Plus:

In fact, with the relocate_vdso stuff it doesn't matter what VDSO_PRELINK
is at compile time.  It saves the small amount of startup cost if it
matches the runtime address, but that is probably not noticeable.

> furthermore, there should be a paravirt_ops method to chose the 
> relocation address, unless i'm missing something. On the native kernel 
> that address will default to 0xe000. (if CONFIG_COMPAT_VDSO is 
> selected)

For everything else to work, it needs to be set by changing __FIXADDR_TOP,
which seems to be done by calling reserve_top_address early enough.
It looks like that needs to be properly tied into paravirt_ops somehow.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-06 Thread Ingo Molnar

* Roland McGrath <[EMAIL PROTECTED]> wrote:

> > Jan Beulich just posted a patch to do just this - relocate the 
> > vdso's ELF header.  If that's all that's really required to keep 
> > COMPAT_VDSO viable under PARAVIRT, then it seems like the way to go.
> 
> I found 
> http://marc.theaimsgroup.com/?l=xen-devel=117309332600075=2 and 
> that must be the one you meant.  The ELF-grokking form of that is 
> exactly what I had in mind.  The "find relocs with cmp" scheme is 
> pretty silly, but also works fine.  It trades tweaky ELF knowledge 
> with tweaky fragile build methods, but it's all about the same to me.

this looks good to me too in principle, the #else branch. But the actual 
implementation will have to be redone quite a bit i fear. Some details: 
relocate_vdso() needs some major coding style cleanups. This bit:

-# define VDSO_PRELINK  VDSO_HIGH_BASE
+# ifndef CONFIG_XEN
+#  define VDSO_PRELINK VDSO_HIGH_BASE
+# else
+#  define VDSO_PRELINK (0UL - FIX_VDSO * PAGE_SIZE)
+# endif

should be Kconfig driven, not #ifdef driven, due to cleanliness and also 
because lguest wants to have the same thing. Plus:

+#if defined(CONFIG_XEN) && defined(CONFIG_COMPAT_VDSO)

i'd just make this depend on CONFIG_COMPAT_VDSO, always. Same here:

+#if defined(CONFIG_XEN) && defined(CONFIG_COMPAT_VDSO)
+static void __init relocate_vdso

just make this driven in the normal CONFIG_COMPAT_VDSO case too - even 
though we 'prelink' the VDSO to the usual address - we better run 
through the same code all the time and reduce the number of variants as 
much as possible.

furthermore, there should be a paravirt_ops method to chose the 
relocation address, unless i'm missing something. On the native kernel 
that address will default to 0xe000. (if CONFIG_COMPAT_VDSO is 
selected)

this way there will only be two main variants to worry about: compat and 
modern (which is the current status quo anyway), instead of 4-5 
variants.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-06 Thread Ingo Molnar

* Roland McGrath [EMAIL PROTECTED] wrote:

  Jan Beulich just posted a patch to do just this - relocate the 
  vdso's ELF header.  If that's all that's really required to keep 
  COMPAT_VDSO viable under PARAVIRT, then it seems like the way to go.
 
 I found 
 http://marc.theaimsgroup.com/?l=xen-develm=117309332600075w=2 and 
 that must be the one you meant.  The ELF-grokking form of that is 
 exactly what I had in mind.  The find relocs with cmp scheme is 
 pretty silly, but also works fine.  It trades tweaky ELF knowledge 
 with tweaky fragile build methods, but it's all about the same to me.

this looks good to me too in principle, the #else branch. But the actual 
implementation will have to be redone quite a bit i fear. Some details: 
relocate_vdso() needs some major coding style cleanups. This bit:

-# define VDSO_PRELINK  VDSO_HIGH_BASE
+# ifndef CONFIG_XEN
+#  define VDSO_PRELINK VDSO_HIGH_BASE
+# else
+#  define VDSO_PRELINK (0UL - FIX_VDSO * PAGE_SIZE)
+# endif

should be Kconfig driven, not #ifdef driven, due to cleanliness and also 
because lguest wants to have the same thing. Plus:

+#if defined(CONFIG_XEN)  defined(CONFIG_COMPAT_VDSO)

i'd just make this depend on CONFIG_COMPAT_VDSO, always. Same here:

+#if defined(CONFIG_XEN)  defined(CONFIG_COMPAT_VDSO)
+static void __init relocate_vdso

just make this driven in the normal CONFIG_COMPAT_VDSO case too - even 
though we 'prelink' the VDSO to the usual address - we better run 
through the same code all the time and reduce the number of variants as 
much as possible.

furthermore, there should be a paravirt_ops method to chose the 
relocation address, unless i'm missing something. On the native kernel 
that address will default to 0xe000. (if CONFIG_COMPAT_VDSO is 
selected)

this way there will only be two main variants to worry about: compat and 
modern (which is the current status quo anyway), instead of 4-5 
variants.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-06 Thread Roland McGrath
 -# define VDSO_PRELINK  VDSO_HIGH_BASE
 +# ifndef CONFIG_XEN
 +#  define VDSO_PRELINK VDSO_HIGH_BASE
 +# else
 +#  define VDSO_PRELINK (0UL - FIX_VDSO * PAGE_SIZE)
 +# endif
 
 should be Kconfig driven, not #ifdef driven, due to cleanliness and also 
 because lguest wants to have the same thing. Plus:

In fact, with the relocate_vdso stuff it doesn't matter what VDSO_PRELINK
is at compile time.  It saves the small amount of startup cost if it
matches the runtime address, but that is probably not noticeable.

 furthermore, there should be a paravirt_ops method to chose the 
 relocation address, unless i'm missing something. On the native kernel 
 that address will default to 0xe000. (if CONFIG_COMPAT_VDSO is 
 selected)

For everything else to work, it needs to be set by changing __FIXADDR_TOP,
which seems to be done by calling reserve_top_address early enough.
It looks like that needs to be properly tied into paravirt_ops somehow.


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-06 Thread Jeremy Fitzhardinge
Roland McGrath wrote:
 For everything else to work, it needs to be set by changing __FIXADDR_TOP,
 which seems to be done by calling reserve_top_address early enough.
 It looks like that needs to be properly tied into paravirt_ops somehow.
   

The startup code for whatever hypervisor you're running makes a call to
reserve_top to reserve the appropriate amount of address space.  This
happens very early.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-06 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
 Yes, I don't have a problem with your patch, I just wish I had been
 cc'd on it.  Fixing this is rather tricky, but I believe no strange
 build magic is required, it can be done in kernel init code.  Still
 building my SUSE 9.0 guest to test.  SUSE 9.0 is one of those that
 requires COMPAT_VDSO, yes?

Where do we stand on VDSO reloc?  Are you turning this into a proper
patch, or shall I have a look at it?

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Zachary Amsden <[EMAIL PROTECTED]> wrote:

> > yeah. (plus my patches of course that remove the current 
> > papering-over hackery and restores COMPAT_VDSO.)
> 
> Yes, I don't have a problem with your patch, I just wish I had been 
> cc'd on it. [...]

(i Cc:-ed you to the other ones - i simply forgot and bounced it to you 
a few hours down the line - sorry!)

> [...] Fixing this is rather tricky, but I believe no strange build 
> magic is required, it can be done in kernel init code.  Still building 
> my SUSE 9.0 guest to test.  SUSE 9.0 is one of those that requires 
> COMPAT_VDSO, yes?

yeah, and a handful of other ones. It depends on the glibc version: 
early vdso glibs were buggy and assumed a few things about the vdso, so 
they would segfault on the new-style vdso which is fully relocatable 
(and hence mergable into the vma space, randomizable, etc.).

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Ingo Molnar wrote:

* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

  

I had just sent this out for internal review...
  
I think Jan's approach is better if it works (since there's no 
compromise), but this is better if you want to get something working 
in the near term.



yeah. (plus my patches of course that remove the current papering-over 
hackery and restores COMPAT_VDSO.)
  


Yes, I don't have a problem with your patch, I just wish I had been cc'd 
on it.  Fixing this is rather tricky, but I believe no strange build 
magic is required, it can be done in kernel init code.  Still building 
my SUSE 9.0 guest to test.  SUSE 9.0 is one of those that requires 
COMPAT_VDSO, yes?


Thanks,

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:

> > I had just sent this out for internal review...
> 
> I think Jan's approach is better if it works (since there's no 
> compromise), but this is better if you want to get something working 
> in the near term.

yeah. (plus my patches of course that remove the current papering-over 
hackery and restores COMPAT_VDSO.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> Jeremy Fitzhardinge wrote:
>> I think Jan's approach is better if it works (since there's no
>> compromise), but this is better if you want to get something working in
>> the near term.
>>   
>
> Is Jan dynamically relinking the vdso at runtime?  
Yes.  His patch adds a runtime relocation for the vdso.
http://marc.theaimsgroup.com/?l=xen-devel=117309332600075=2

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

I think Jan's approach is better if it works (since there's no
compromise), but this is better if you want to get something working in
the near term.
  


Is Jan dynamically relinking the vdso at runtime?  Because that is what 
we will need.  Adding support for COMPAT_VDSO creates a major 
performance problem for us.  We need a way to force the VDSO to be 
enabled with a boot parameter.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> Rusty Russell wrote:
>> On Tue, 2007-03-06 at 00:28 +1100, Rusty Russell wrote:
>>  
>>> On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:
>>>    
>>>> Subject: [patch] paravirt: VDSO page is essential
>>>> From: Ingo Molnar <[EMAIL PROTECTED]>
>>>>
>>>> commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO
>>>> for CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the
>>>> VDSO is an essential component of Linux, and this change forces all
>>>> of them to use int $0x80 - including sane ones like KVM. (If a
>>>> hypervisor does not handle the VDSO properly then it can work
>>>> things around via the vdso=0 boot option. Or CONFIG_PARAVIRT should
>>>> not have been merged. But in any case, it is a basic taste issue:
>>>> we DO NOT #ifdef around core features like this!)
>>>>   
>>> I agree with the criticism, dislike the snarly comments, and disagree
>>> with this patch.
>>> 
>>
>> And my patch was pretty crack-induced too.  Sorry.
>>
>> I shouldn't have been thinking about using CONFIG options at all: we
>> should simply disable the vdso if CONFIG_COMPAT_VDSO=y when we
>> *actually* reserve top memory.
>>
>> This still need some work (doing that now), but do people like the idea?
>>
>> The current "vdso_disabled" flag merely disabled the ELF note, so it
>> needs to be made a little stronger, to not set up the vdso at all.
>>   
>
> I had just sent this out for internal review...
>

I think Jan's approach is better if it works (since there's no
compromise), but this is better if you want to get something working in
the near term.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Rusty Russell
On Mon, 2007-03-05 at 17:03 -0800, Zachary Amsden wrote:
> Rusty Russell wrote:
> > This still need some work (doing that now), but do people like the idea?
> >
> I had just sent this out for internal review...

Spooky!

I was just testing with lguest, but I'll do so with your patch instead.

Thanks,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Rusty Russell wrote:

On Tue, 2007-03-06 at 00:28 +1100, Rusty Russell wrote:
  

On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:


Subject: [patch] paravirt: VDSO page is essential
From: Ingo Molnar <[EMAIL PROTECTED]>

commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
an essential component of Linux, and this change forces all of them to 
use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
handle the VDSO properly then it can work things around via the vdso=0 
boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
case, it is a basic taste issue: we DO NOT #ifdef around core features 
like this!)
  

I agree with the criticism, dislike the snarly comments, and disagree
with this patch.



And my patch was pretty crack-induced too.  Sorry.

I shouldn't have been thinking about using CONFIG options at all: we
should simply disable the vdso if CONFIG_COMPAT_VDSO=y when we
*actually* reserve top memory.

This still need some work (doing that now), but do people like the idea?

The current "vdso_disabled" flag merely disabled the ELF note, so it
needs to be made a little stronger, to not set up the vdso at all.
  


I had just sent this out for internal review...


COMPAT_VDSO is incompatible with PARAVIRT for most implementations, as they
must relocate the fixmap to make room for a hypervisor.  So allow COMPAT_VDSO
kernels to relocate the fixmap as well, just disable the VDSO if they do so.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r fad0910252d2 arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c   Mon Mar 05 15:24:04 2007 -0800
+++ b/arch/i386/kernel/sysenter.c   Mon Mar 05 15:27:31 2007 -0800
@@ -74,7 +74,12 @@ static struct page *syscall_pages[1];
 
 int __init sysenter_setup(void)
 {
-   void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
+   void *syscall_page;
+
+   if (!vdso_enabled)
+   return 0;
+
+   syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
syscall_pages[0] = virt_to_page(syscall_page);
 
 #ifdef CONFIG_COMPAT_VDSO
@@ -106,6 +111,11 @@ int arch_setup_additional_pages(struct l
struct mm_struct *mm = current->mm;
unsigned long addr;
int ret;
+
+   if (!vdso_enabled) {
+   current->mm->context.vdso = (void *)~0UL;
+   return 0;
+   }
 
down_write(>mmap_sem);
addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
diff -r fad0910252d2 arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.cMon Mar 05 15:24:04 2007 -0800
+++ b/arch/i386/mm/pgtable.cMon Mar 05 16:06:31 2007 -0800
@@ -144,10 +144,8 @@ void set_pmd_pfn(unsigned long vaddr, un
 }
 
 static int fixmaps;
-#ifndef CONFIG_COMPAT_VDSO
 unsigned long __FIXADDR_TOP = 0xf000;
 EXPORT_SYMBOL(__FIXADDR_TOP);
-#endif
 
 void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t 
flags)
 {
@@ -174,11 +172,13 @@ void reserve_top_address(unsigned long r
printk(KERN_INFO "Reserving virtual address space above 0x%08x\n",
   (int)-reserve);
 #ifdef CONFIG_COMPAT_VDSO
-   BUG_ON(reserve != 0);
-#else
+   if (reserve != 0) {
+   printk(KERN_WARNING "Compat VDSO is incompatible with fixmap 
relocation - disabling VDSO\n");
+   vdso_enabled = 0;
+   }
+#endif
__FIXADDR_TOP = -reserve - PAGE_SIZE;
__VMALLOC_RESERVE += reserve;
-#endif
 }
 
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
diff -r fad0910252d2 include/asm-i386/elf.h
--- a/include/asm-i386/elf.hMon Mar 05 15:24:04 2007 -0800
+++ b/include/asm-i386/elf.hMon Mar 05 15:44:43 2007 -0800
@@ -137,7 +137,7 @@ extern int dump_task_extended_fpu (struc
 
 #ifdef CONFIG_COMPAT_VDSO
 # define VDSO_COMPAT_BASE  VDSO_HIGH_BASE
-# define VDSO_PRELINK  VDSO_HIGH_BASE
+# define VDSO_PRELINK  0xe000UL
 #else
 # define VDSO_COMPAT_BASE  VDSO_BASE
 # define VDSO_PRELINK  0
diff -r fad0910252d2 include/asm-i386/fixmap.h
--- a/include/asm-i386/fixmap.h Mon Mar 05 15:24:04 2007 -0800
+++ b/include/asm-i386/fixmap.h Mon Mar 05 15:59:30 2007 -0800
@@ -14,19 +14,6 @@
 #define _ASM_FIXMAP_H
 
 
-/* used by vmalloc.c, vsyscall.lds.S.
- *
- * Leave one empty page between vmalloc'ed areas and
- * the start of the fixmap.
- */
-#ifndef CONFIG_COMPAT_VDSO
-extern unsigned long __FIXADDR_TOP;
-#else
-#define __FIXADDR_TOP  0xf000
-#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO)
-#define FIXADDR_USER_END   __fix_to_virt(FIX_VDSO - 1)
-#endif
-
 #ifndef __ASSEMBLY__
 #include 
 #include 
@@ -35,6 +22,15 @@ extern unsigned long __FIXADDR_TOP;
 #ifdef CONFIG_HIGHMEM
 #include 
 #include 
+#endif
+
+/* used by vmalloc.c, vsyscall.lds.S, elf.h, pgtable.c */
+extern unsigned long __FIXADDR_TOP

Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Rusty Russell
On Tue, 2007-03-06 at 00:28 +1100, Rusty Russell wrote:
> On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:
> > Subject: [patch] paravirt: VDSO page is essential
> > From: Ingo Molnar <[EMAIL PROTECTED]>
> > 
> > commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
> > CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
> > an essential component of Linux, and this change forces all of them to 
> > use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
> > handle the VDSO properly then it can work things around via the vdso=0 
> > boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
> > case, it is a basic taste issue: we DO NOT #ifdef around core features 
> > like this!)
> 
> I agree with the criticism, dislike the snarly comments, and disagree
> with this patch.

And my patch was pretty crack-induced too.  Sorry.

I shouldn't have been thinking about using CONFIG options at all: we
should simply disable the vdso if CONFIG_COMPAT_VDSO=y when we
*actually* reserve top memory.

This still need some work (doing that now), but do people like the idea?

The current "vdso_disabled" flag merely disabled the ELF note, so it
needs to be made a little stronger, to not set up the vdso at all.

diff -r f75715e64a3b arch/i386/Kconfig
--- a/arch/i386/Kconfig Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/Kconfig Tue Mar 06 09:30:36 2007 +1100
@@ -893,9 +893,10 @@ config COMPAT_VDSO
 config COMPAT_VDSO
bool "Compat VDSO support"
default y
-   depends on !PARAVIRT
-   help
- Map the VDSO to the predictable old-style address too.
+   help
+ Map the VDSO to the predictable old-style address too, or
+ in the case of a VMI/Xen/lguest virtualized guest, don't create
+ the VDSO at all.
---help---
  Say N here if you are running a sufficiently recent glibc
  version (2.3.3 or later), to remove the high-mapped
diff -r f75715e64a3b arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c   Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/kernel/sysenter.c   Tue Mar 06 09:25:47 2007 +1100
@@ -27,11 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
 unsigned int __read_mostly vdso_enabled = 1;
-#endif
 
 EXPORT_SYMBOL_GPL(vdso_enabled);
 
@@ -51,7 +47,7 @@ void enable_sep_cpu(void)
int cpu = get_cpu();
struct tss_struct *tss = _cpu(init_tss, cpu);
 
-   if (!boot_cpu_has(X86_FEATURE_SEP)) {
+   if (!boot_cpu_has(X86_FEATURE_SEP) || !vdso_enabled) {
put_cpu();
return;
}
@@ -74,7 +70,12 @@ static struct page *syscall_pages[1];
 
 int __init sysenter_setup(void)
 {
-   void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
+   void *syscall_page;
+
+   if (!vdso_enabled)
+   return 0;
+
+   syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
syscall_pages[0] = virt_to_page(syscall_page);
 
 #ifdef CONFIG_COMPAT_VDSO
@@ -106,6 +107,9 @@ int arch_setup_additional_pages(struct l
struct mm_struct *mm = current->mm;
unsigned long addr;
int ret;
+
+   if (!vdso_enabled)
+   return 0;
 
down_write(>mmap_sem);
addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
diff -r f75715e64a3b arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.cTue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/mm/pgtable.cTue Mar 06 09:32:51 2007 +1100
@@ -144,10 +144,8 @@ void set_pmd_pfn(unsigned long vaddr, un
 }
 
 static int fixmaps;
-#ifndef CONFIG_COMPAT_VDSO
 unsigned long __FIXADDR_TOP = 0xf000;
 EXPORT_SYMBOL(__FIXADDR_TOP);
-#endif
 
 void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t 
flags)
 {
@@ -174,11 +172,12 @@ void reserve_top_address(unsigned long r
printk(KERN_INFO "Reserving virtual address space above 0x%08x\n",
   (int)-reserve);
 #ifdef CONFIG_COMPAT_VDSO
-   BUG_ON(reserve != 0);
-#else
+   /* We can't have both reserved space and VDSO at 0xE000. */
+   if (reserve)
+   vdso_enabled = 0;
+#endif
__FIXADDR_TOP = -reserve - PAGE_SIZE;
__VMALLOC_RESERVE += reserve;
-#endif
 }
 
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
diff -r f75715e64a3b include/asm-i386/fixmap.h
--- a/include/asm-i386/fixmap.h Tue Mar 06 00:04:50 2007 +1100
+++ b/include/asm-i386/fixmap.h Tue Mar 06 09:29:15 2007 +1100
@@ -19,10 +19,8 @@
  * Leave one empty page between vmalloc'ed areas and
  * the start of the fixmap.
  */
-#ifndef CONFIG_COMPAT_VDSO
 extern unsigned long __FIXADDR_TOP;
-#else
-#define __FIXADDR_TOP  0xf000
+#ifdef CONFIG_COMPAT_VDSO
 #d

Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Jeremy Fitzhardinge
Roland McGrath wrote:
>> Jan Beulich just posted a patch to do just this - relocate the vdso's
>> ELF header.  If that's all that's really required to keep COMPAT_VDSO
>> viable under PARAVIRT, then it seems like the way to go.
>> 
>
> I found http://marc.theaimsgroup.com/?l=xen-devel=117309332600075=2 and
> that must be the one you meant.  The ELF-grokking form of that is exactly
> what I had in mind.  The "find relocs with cmp" scheme is pretty silly, but
> also works fine.  It trades tweaky ELF knowledge with tweaky fragile build
> methods, but it's all about the same to me.  
>   

That's the one.  I think the C version is the one to go with.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Roland McGrath
> Jan Beulich just posted a patch to do just this - relocate the vdso's
> ELF header.  If that's all that's really required to keep COMPAT_VDSO
> viable under PARAVIRT, then it seems like the way to go.

I found http://marc.theaimsgroup.com/?l=xen-devel=117309332600075=2 and
that must be the one you meant.  The ELF-grokking form of that is exactly
what I had in mind.  The "find relocs with cmp" scheme is pretty silly, but
also works fine.  It trades tweaky ELF knowledge with tweaky fragile build
methods, but it's all about the same to me.  


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Jeremy Fitzhardinge
Roland McGrath wrote:
> Does the old userland compatibility you're concerned about really need the
> vdso to be at 0xfe000 in particular, or just need it to be at a fixed
> address that matches the phdrs inside the image?  My recollection of the old
> glibc's limitation was that it expected the image's phdrs to match its load
> address.  The xen kernels used to change this to 0xd000 or something,
> and AFAIK that was fine.  If that's all that's needed, it is not so hard to
> adjust the vDSO contents at boot time (phdrs, shdrs, and symbols; no code
> contents use the absolute address).  Under CONFIG_COMPAT_VDSO, it can see
> where the paravirt moved the fixmap to, and apply adjustments.
>   

Jan Beulich just posted a patch to do just this - relocate the vdso's
ELF header.  If that's all that's really required to keep COMPAT_VDSO
viable under PARAVIRT, then it seems like the way to go.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Roland McGrath
Does the old userland compatibility you're concerned about really need the
vdso to be at 0xfe000 in particular, or just need it to be at a fixed
address that matches the phdrs inside the image?  My recollection of the old
glibc's limitation was that it expected the image's phdrs to match its load
address.  The xen kernels used to change this to 0xd000 or something,
and AFAIK that was fine.  If that's all that's needed, it is not so hard to
adjust the vDSO contents at boot time (phdrs, shdrs, and symbols; no code
contents use the absolute address).  Under CONFIG_COMPAT_VDSO, it can see
where the paravirt moved the fixmap to, and apply adjustments.

That said, I don't think this is worth all that much effort since
CONFIG_COMPAT_VDSO is not really desireable for most people.  I think
disabling the vdso under CONFIG_COMPAT_VDSO+CONFIG_PARAVIRT is survivable
(just don't set CONFIG_COMPAT_VDSO for a system you want to be optimal).


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Ingo Molnar wrote:

there's no need to disable the VDSO for old userspace ...
  


Well, apart from the obvious question to which nobody actually knows the 
answer, (how many people run old user space that required 
CONFIG_COMPAT_VDSO), what do you think of reversing the boot option?


vdso=enabled (default - turn on VDSO on normal boots)
vdso=disabled (turn off VDSO unconditionally)
[vdso=compat] (default for COMPAT_VDSO - keep VDSO only when mapped at 
compat location.  Note the option is not required to be implemented 
because it is logically implied from vdso=enabled && COMPAT_VDSO and the 
default boot behavior)
vdso=force (keep VDSO even when moved to a new location and COMPAT_VDSO 
is enabled).


In our case, installing VMware tools in the guest would then detect if 
userspace supports VDSO or if it requires COMPAT_VDSO and would then set 
boot parameters for the kernel appropriately.  And the native boot and 
kvm paravirt-ops boot are completely unaffected.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Andi Kleen wrote:

The boot option is already there, but boot options for is not my
idea of user friendly binary compatibility.
  


Yes, not friendly.  Perhaps we can reverse the boot option?  I.e make 
vdso_enable=force re-activate the vdso even if it gets moved by a 
hypervisor and COMPAT_VDSO was compiled in.


But how many actual users are affected by these old glibc's that we need 
to care so much about them running new technology, especially if that 
technology can be built with the ability to warn them that they probably 
need to boot with vdso=disabled?



and a printk warning if you take #GPs and kill the init  
proc, because for us, this is not an expected support scenario.  We 
would much rather support the VDSO by default in paravirt kernels even 
with COMPAT_VDSO turned on.



But you can't have it at the compatible fixed address, right?
  


While technically possible, is is not possible to do that anytime soon, 
and the drawbacks might overshadow the performance gain of the VDSO - 
and I don't think any of the other hypervisors (lhype, Xen) will be able 
to do that either.  KVM can do it, only because it relies on hardware virt.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Zachary Amsden <[EMAIL PROTECTED]> wrote:

> [...]  I can't comment on Ingo's patch as he failed to -cc me on it,

it's on lkml, i'll bounce it to you separately.

> despite the fact that we are the only ones affected by it. [...]

actually, no, my motivation for it was KVM (a Linux based hypervisor and 
its paravirtual interface). You said that VMI wont hinder Linux 
development and design in any way and can adopt to whatever change the 
upstream kernel does, so i'm taking your word for that.

> [...] We are not concerned so much with supporting legacy user land 
> deployments, as we expect for the most part, the install scenario to 
> happen when upgrading to new distros.

that's not really the right argument. This obviously affects the host 
kernel too if it has CONFIG_PARAVIRT enabled. The other option is the 
removal (or temporary disabling) of the whole CONFIG_VMI and 
CONFIG_PARAVIRT stuff if you cannot get into minimal release shape, it's 
experimental stuff after all.

> What we really need to do is to be able to detect an old user land and 
> drop VDSO support when that is found.  [...]

no. What you need is to keep existing mechanisms and not hack off the 
vdso and CONFIG_COMPAT_VDSO...

> [...] But since we can't do that, the next best thing is to allow the 
> hypervisor to choose whatever workaround it wants when it moves the 
> fixmap and compat_vdso was enabled.  In our case, the workaround we 
> will want is a boot option to disable VDSO for old user land, [...]

there's no need to disable the VDSO for old userspace ...

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
 We are not concerned so much 
> with supporting legacy user land deployments,

I am concerned about that. I won't merge any patches that break compatibility
by default.

> > What would probably work is to somehow decide at runtime if a hypervisor
> > is there or not and then set vdso default based on that. I guess that
> > detection would be hypervisor specific though and probably would
> > need paravirt ops extensions.
> >   
> 
> What we really need to do is to be able to detect an old user land and 
> drop VDSO support when that is found.  

Rusty implemented that, but it was widely considered too ugly
(and it was not 100% reliable e.g. with chroots) 

> But since we can't do that, the  
> next best thing is to allow the hypervisor to choose whatever workaround 
> it wants when it moves the fixmap and compat_vdso was enabled.  In our 
> case, the workaround we will want is a boot option to disable VDSO for 
> old user land,

The boot option is already there, but boot options for is not my
idea of user friendly binary compatibility.

> and a printk warning if you take #GPs and kill the init  
> proc, because for us, this is not an expected support scenario.  We 
> would much rather support the VDSO by default in paravirt kernels even 
> with COMPAT_VDSO turned on.

But you can't have it at the compatible fixed address, right?

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Andi Kleen wrote:

But this still means I would need to decide between a PARAVIRT
kernel that either supports xen/VMI or cannot boot old user land without
weird options.  I don't think that's the correct solution. The goal
is a single binary that runs everywhere and is still compatible.
  


Rusty's patch looks a step in the right direction to me.  I can't 
comment on Ingo's patch as he failed to -cc me on it, despite the fact 
that we are the only ones affected by it.  We are not concerned so much 
with supporting legacy user land deployments, as we expect for the most 
part, the install scenario to happen when upgrading to new distros.



What would probably work is to somehow decide at runtime if a hypervisor
is there or not and then set vdso default based on that. I guess that
detection would be hypervisor specific though and probably would
need paravirt ops extensions.
  


What we really need to do is to be able to detect an old user land and 
drop VDSO support when that is found.  But since we can't do that, the 
next best thing is to allow the hypervisor to choose whatever workaround 
it wants when it moves the fixmap and compat_vdso was enabled.  In our 
case, the workaround we will want is a boot option to disable VDSO for 
old user land, and a printk warning if you take #GPs and kill the init 
proc, because for us, this is not an expected support scenario.  We 
would much rather support the VDSO by default in paravirt kernels even 
with COMPAT_VDSO turned on.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Avi Kivity <[EMAIL PROTECTED]> wrote:

> kvm paravirt is only used by developers at this time.  I don't have a 
> problem with solving this in the 2.6.22 timeframe.

oh, certainly, with a KVM hat on i'd agree. But i also have my 
cares-about-i386-arch-cleanliness and cares-about-Linux-maintainability 
hats on ;-) I only mentioned KVM as an example that it /can/ be done 
cleanly.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Avi Kivity

Ingo Molnar wrote:

* Andi Kleen <[EMAIL PROTECTED]> wrote:

  
I think we would need to have a paravirt ops callback to decide this 
first. But it doesn't look critical to me anyways.



well, it's critical to me in two ways: 1) to make the i386 paravirt code 
clean 2) to have a proper VDSO for a KVM paravirtual guest. The original 
change is also bad because it changes how a Linux guest behaves: it 
turns off the vdso by default, and disables the compat VDSO. I.e. it's a 
bad performance step backwards if CONFIG_PARAVIRT is enabled (uses int 
$0x80 instead of sysenter), which hurts only KVM and basically none of 
the other hypervisors. It also muddifies the VDSO picture wrt. 
virtualization.


  


kvm paravirt is only used by developers at this time.  I don't have a 
problem with solving this in the 2.6.22 timeframe.


--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> I think we would need to have a paravirt ops callback to decide this 
> first. But it doesn't look critical to me anyways.

well, it's critical to me in two ways: 1) to make the i386 paravirt code 
clean 2) to have a proper VDSO for a KVM paravirtual guest. The original 
change is also bad because it changes how a Linux guest behaves: it 
turns off the vdso by default, and disables the compat VDSO. I.e. it's a 
bad performance step backwards if CONFIG_PARAVIRT is enabled (uses int 
$0x80 instead of sysenter), which hurts only KVM and basically none of 
the other hypervisors. It also muddifies the VDSO picture wrt. 
virtualization.

i.e. it hurts the sane stuff and benefits the crappy stuff, and my 
change undoes that. That's enough for me to call it critical ;-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
On Mon, Mar 05, 2007 at 02:48:54PM +0100, Ingo Molnar wrote:
> 
> * Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > The problem is not the syscall instruction, but some problem with the 
> > way the vDSO mapping is set up with CONFIG_PARAVIRT that broke older 
> > glibc.
> 
> the problem is that CONFIG_PARAVIRT silently turns off 
> CONFIG_COMPAT_VDSO, which of course breaks 'old' glibc. This too is a 
> must-have fix for v2.6.21.

We seem to have different definitions of must-have.

I think we would need to have a paravirt ops callback to decide this
first. But it doesn't look critical to me anyways.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> The problem is not the syscall instruction, but some problem with the 
> way the vDSO mapping is set up with CONFIG_PARAVIRT that broke older 
> glibc.

the problem is that CONFIG_PARAVIRT silently turns off 
CONFIG_COMPAT_VDSO, which of course breaks 'old' glibc. This too is a 
must-have fix for v2.6.21.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> > VDSO is only a problem if (1) the hypervisor wants to reserve the 
> > top virtual address space (CONFIG_PARAVIRT=y), and (2) the glibc is 
> > old and
> 
> It broke the boot even with native hardware, no hypervisor

yeah - i just sent the fix for that regression - by making 
CONFIG_COMPAT_VDSO usable again.

basically, we want to avoid all the compatibility mess that exists on 
the hypervisor side to enter the Linux kernel.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Rusty Russell <[EMAIL PROTECTED]> wrote:

> -#ifdef CONFIG_PARAVIRT
> +#if defined(CONFIG_COMPAT_VDSO) && defined(CONFIG_RESERVE_TOP)

NACK - my patch is quite a bit simpler and yours only increases the
#ifdef jungle. If there's any complication of the VDSO coming from some
other hypervisor support patch then I will judge that in full context, 
when it's submitted. Meanwhile, my patch is a must-have for v2.6.21.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
> VDSO is only a problem if (1) the hypervisor wants to reserve the top
> virtual address space (CONFIG_PARAVIRT=y), and (2) the glibc is old and

It broke the boot even with native hardware, no hypervisor

>  
> +config RESERVE_TOP
> + bool
> + help
> +   Many hypervisors want to reserve some amount of the top of
> +   virtual address space.  Unfortunately, old glibc needs the
> +   vdso page there, so we must disable vdso if COMPAT_VDSO is
> +   enabled as well as this option.

But this still means I would need to decide between a PARAVIRT
kernel that either supports xen/VMI or cannot boot old user land without
weird options.  I don't think that's the correct solution. The goal
is a single binary that runs everywhere and is still compatible.

What would probably work is to somehow decide at runtime if a hypervisor
is there or not and then set vdso default based on that. I guess that
detection would be hypervisor specific though and probably would
need paravirt ops extensions.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Rusty Russell
On Mon, 2007-03-05 at 15:00 +0200, Avi Kivity wrote:
> Ingo Molnar wrote:
> > * Avi Kivity <[EMAIL PROTECTED]> wrote:
> > but yes, i agree that the hypervisor should have the ability to patch 
> > the syscall instruction of both the hypervisor interface and of the VDSO 
> > interface. But this wasnt implemented like that, and the #ifdef quirk 
> > just /prevents/ a sane solution like that from ever getting done the 
> > right way.
> >   
> 
> Rusty, shouldn't this be a one-liner?  No need to involve the hypervisor 
> here; the guest can s/syscall/int 80/ on its vdso page like it patches 
> cli and its ilk.

Probably, but this is a red herring: see previous reply.  Andi was a
little overzealous w/ CONFIG_PARAVIRT & COMPAT_VDSO, that's all.

I've never thought of replacing the syscall insn.  I'll see if I can
come up with a good reason to want to 8)

Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
> Can't paravirt patch the syscall instruction like it does the rest of 
> the kernel?

The problem is not the syscall instruction, but some problem with
the way the vDSO mapping is set up with CONFIG_PARAVIRT that broke older glibc.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Rusty Russell
On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:
> Subject: [patch] paravirt: VDSO page is essential
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
> CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
> an essential component of Linux, and this change forces all of them to 
> use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
> handle the VDSO properly then it can work things around via the vdso=0 
> boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
> case, it is a basic taste issue: we DO NOT #ifdef around core features 
> like this!)

I agree with the criticism, dislike the snarly comments, and disagree
with this patch.

VDSO is only a problem if (1) the hypervisor wants to reserve the top
virtual address space (CONFIG_PARAVIRT=y), and (2) the glibc is old and
can't handle a VDSO mapped anywhere but 0xE000
(CONFIG_COMPAT_VDSO=y).

Now, KVM wants to use CONFIG_PARAVIRT=y but not reserve_top_address(),
so we should split the config option.  Let's not get too excited because
we kept it simple.  Patch (untested, but fairly simple) below.

BTW, I had a patch to do a runtime test (old glibc causes init to
assert, then disable vdso and try again): everyone hated it.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r f75715e64a3b arch/i386/Kconfig
--- a/arch/i386/Kconfig Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/Kconfig Tue Mar 06 00:20:44 2007 +1100
@@ -218,9 +218,18 @@ config PARAVIRT
  However, when run without a hypervisor the kernel is
  theoretically slower.  If in doubt, say N.
 
+config RESERVE_TOP
+   bool
+   help
+ Many hypervisors want to reserve some amount of the top of
+ virtual address space.  Unfortunately, old glibc needs the
+ vdso page there, so we must disable vdso if COMPAT_VDSO is
+ enabled as well as this option.
+
 config VMI
bool "VMI Paravirt-ops support"
depends on PARAVIRT && !NO_HZ
+   select RESERVE_TOP
default y
help
  VMI provides a paravirtualized interface to multiple hypervisors
@@ -893,9 +902,10 @@ config COMPAT_VDSO
 config COMPAT_VDSO
bool "Compat VDSO support"
default y
-   depends on !PARAVIRT
-   help
- Map the VDSO to the predictable old-style address too.
+   help
+ Map the VDSO to the predictable old-style address too, or
+ in the case of a VMI/Xen/lguest virtualized guest, don't create
+ the VDSO at all.
---help---
  Say N here if you are running a sufficiently recent glibc
  version (2.3.3 or later), to remove the high-mapped
diff -r f75715e64a3b arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c   Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/kernel/sysenter.c   Tue Mar 06 00:21:42 2007 +1100
@@ -27,7 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
+#if defined(CONFIG_COMPAT_VDSO) && defined(CONFIG_RESERVE_TOP)
 unsigned int __read_mostly vdso_enabled = 0;
 #else
 unsigned int __read_mostly vdso_enabled = 1;
diff -r f75715e64a3b arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.cTue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/mm/pgtable.cTue Mar 06 00:06:00 2007 +1100
@@ -173,7 +173,7 @@ void reserve_top_address(unsigned long r
BUG_ON(fixmaps > 0);
printk(KERN_INFO "Reserving virtual address space above 0x%08x\n",
   (int)-reserve);
-#ifdef CONFIG_COMPAT_VDSO
+#ifndef CONFIG_RESERVE_TOP
BUG_ON(reserve != 0);
 #else
__FIXADDR_TOP = -reserve - PAGE_SIZE;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
On Mon, Mar 05, 2007 at 01:06:31PM +0100, Ingo Molnar wrote:
> Subject: [patch] paravirt: VDSO page is essential
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
> CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 

Well it was the change that made my test machine (with SUSE 9.0 userland)
work with CONFIG_PARAVIRT on. If you have a better solution please post
it.

> an essential component of Linux, and this change forces all of them to 
> use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
> handle the VDSO properly then it can work things around via the vdso=0 

No hypervisor involved, just CONFIG_PARAVIRT=y on bare hardware.

> boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
> case, it is a basic taste issue: we DO NOT #ifdef around core features 
> like this!)

We set sensible defaults for full backwards compatibility. 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Avi Kivity

Ingo Molnar wrote:

* Avi Kivity <[EMAIL PROTECTED]> wrote:

  

-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
unsigned int __read_mostly vdso_enabled = 1;
-#endif
  


  
Can't paravirt patch the syscall instruction like it does the rest of 
the kernel?



we want to keep the guest as simple and unmodified as possible. And all 
this #ifdef jungle /will/ bite back. Especially if the change goes in 
with zero explanation like it did:


[PATCH] paravirt: Disable vdso by default when CONFIG_PARAVIRT is enabled

They don't work together and this way even glibc still works.

i rather want an experimental feature (CONFIG_PARAVIRT) broken on some 
hypervisors for a bit than an entire body of guest OSs getting used to 
the "you dont have to deal with this VDSO annoyance by default" quirk 
forever ...
  


Sure, I agree with this patch.  I'm talking about an alternate solution 
so Xen can work with the vdso instead of #ifdefing away the kernel.


but yes, i agree that the hypervisor should have the ability to patch 
the syscall instruction of both the hypervisor interface and of the VDSO 
interface. But this wasnt implemented like that, and the #ifdef quirk 
just /prevents/ a sane solution like that from ever getting done the 
right way.
  


Rusty, shouldn't this be a one-liner?  No need to involve the hypervisor 
here; the guest can s/syscall/int 80/ on its vdso page like it patches 
cli and its ilk.


--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Avi Kivity <[EMAIL PROTECTED]> wrote:

> >-#ifdef CONFIG_PARAVIRT
> >-unsigned int __read_mostly vdso_enabled = 0;
> >-#else
> > unsigned int __read_mostly vdso_enabled = 1;
> >-#endif

> Can't paravirt patch the syscall instruction like it does the rest of 
> the kernel?

we want to keep the guest as simple and unmodified as possible. And all 
this #ifdef jungle /will/ bite back. Especially if the change goes in 
with zero explanation like it did:

[PATCH] paravirt: Disable vdso by default when CONFIG_PARAVIRT is enabled

They don't work together and this way even glibc still works.

i rather want an experimental feature (CONFIG_PARAVIRT) broken on some 
hypervisors for a bit than an entire body of guest OSs getting used to 
the "you dont have to deal with this VDSO annoyance by default" quirk 
forever ...

but yes, i agree that the hypervisor should have the ability to patch 
the syscall instruction of both the hypervisor interface and of the VDSO 
interface. But this wasnt implemented like that, and the #ifdef quirk 
just /prevents/ a sane solution like that from ever getting done the 
right way.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Avi Kivity

Ingo Molnar wrote:

Subject: [patch] paravirt: VDSO page is essential
From: Ingo Molnar <[EMAIL PROTECTED]>

commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
an essential component of Linux, and this change forces all of them to 
use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
handle the VDSO properly then it can work things around via the vdso=0 
boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
case, it is a basic taste issue: we DO NOT #ifdef around core features 
like this!)


Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/i386/kernel/sysenter.c |4 
 1 file changed, 4 deletions(-)

Index: linux/arch/i386/kernel/sysenter.c
===
--- linux.orig/arch/i386/kernel/sysenter.c
+++ linux/arch/i386/kernel/sysenter.c
@@ -27,11 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
 unsigned int __read_mostly vdso_enabled = 1;
-#endif
 
 EXPORT_SYMBOL_GPL(vdso_enabled);
 
  


Can't paravirt patch the syscall instruction like it does the rest of 
the kernel?


[is someone keeping track of the number of patchsites? e.g. at what date 
will the entire kernel be generated at boot time?]



--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar
Subject: [patch] paravirt: VDSO page is essential
From: Ingo Molnar <[EMAIL PROTECTED]>

commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
an essential component of Linux, and this change forces all of them to 
use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
handle the VDSO properly then it can work things around via the vdso=0 
boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
case, it is a basic taste issue: we DO NOT #ifdef around core features 
like this!)

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/i386/kernel/sysenter.c |4 
 1 file changed, 4 deletions(-)

Index: linux/arch/i386/kernel/sysenter.c
===
--- linux.orig/arch/i386/kernel/sysenter.c
+++ linux/arch/i386/kernel/sysenter.c
@@ -27,11 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
 unsigned int __read_mostly vdso_enabled = 1;
-#endif
 
 EXPORT_SYMBOL_GPL(vdso_enabled);
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar
Subject: [patch] paravirt: VDSO page is essential
From: Ingo Molnar [EMAIL PROTECTED]

commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
an essential component of Linux, and this change forces all of them to 
use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
handle the VDSO properly then it can work things around via the vdso=0 
boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
case, it is a basic taste issue: we DO NOT #ifdef around core features 
like this!)

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 arch/i386/kernel/sysenter.c |4 
 1 file changed, 4 deletions(-)

Index: linux/arch/i386/kernel/sysenter.c
===
--- linux.orig/arch/i386/kernel/sysenter.c
+++ linux/arch/i386/kernel/sysenter.c
@@ -27,11 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
 unsigned int __read_mostly vdso_enabled = 1;
-#endif
 
 EXPORT_SYMBOL_GPL(vdso_enabled);
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Avi Kivity

Ingo Molnar wrote:

Subject: [patch] paravirt: VDSO page is essential
From: Ingo Molnar [EMAIL PROTECTED]

commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
an essential component of Linux, and this change forces all of them to 
use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
handle the VDSO properly then it can work things around via the vdso=0 
boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
case, it is a basic taste issue: we DO NOT #ifdef around core features 
like this!)


Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
---
 arch/i386/kernel/sysenter.c |4 
 1 file changed, 4 deletions(-)

Index: linux/arch/i386/kernel/sysenter.c
===
--- linux.orig/arch/i386/kernel/sysenter.c
+++ linux/arch/i386/kernel/sysenter.c
@@ -27,11 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
 unsigned int __read_mostly vdso_enabled = 1;
-#endif
 
 EXPORT_SYMBOL_GPL(vdso_enabled);
 
  


Can't paravirt patch the syscall instruction like it does the rest of 
the kernel?


[is someone keeping track of the number of patchsites? e.g. at what date 
will the entire kernel be generated at boot time?]



--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Avi Kivity [EMAIL PROTECTED] wrote:

 -#ifdef CONFIG_PARAVIRT
 -unsigned int __read_mostly vdso_enabled = 0;
 -#else
  unsigned int __read_mostly vdso_enabled = 1;
 -#endif

 Can't paravirt patch the syscall instruction like it does the rest of 
 the kernel?

we want to keep the guest as simple and unmodified as possible. And all 
this #ifdef jungle /will/ bite back. Especially if the change goes in 
with zero explanation like it did:

[PATCH] paravirt: Disable vdso by default when CONFIG_PARAVIRT is enabled

They don't work together and this way even glibc still works.

i rather want an experimental feature (CONFIG_PARAVIRT) broken on some 
hypervisors for a bit than an entire body of guest OSs getting used to 
the you dont have to deal with this VDSO annoyance by default quirk 
forever ...

but yes, i agree that the hypervisor should have the ability to patch 
the syscall instruction of both the hypervisor interface and of the VDSO 
interface. But this wasnt implemented like that, and the #ifdef quirk 
just /prevents/ a sane solution like that from ever getting done the 
right way.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Avi Kivity

Ingo Molnar wrote:

* Avi Kivity [EMAIL PROTECTED] wrote:

  

-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
unsigned int __read_mostly vdso_enabled = 1;
-#endif
  


  
Can't paravirt patch the syscall instruction like it does the rest of 
the kernel?



we want to keep the guest as simple and unmodified as possible. And all 
this #ifdef jungle /will/ bite back. Especially if the change goes in 
with zero explanation like it did:


[PATCH] paravirt: Disable vdso by default when CONFIG_PARAVIRT is enabled

They don't work together and this way even glibc still works.

i rather want an experimental feature (CONFIG_PARAVIRT) broken on some 
hypervisors for a bit than an entire body of guest OSs getting used to 
the you dont have to deal with this VDSO annoyance by default quirk 
forever ...
  


Sure, I agree with this patch.  I'm talking about an alternate solution 
so Xen can work with the vdso instead of #ifdefing away the kernel.


but yes, i agree that the hypervisor should have the ability to patch 
the syscall instruction of both the hypervisor interface and of the VDSO 
interface. But this wasnt implemented like that, and the #ifdef quirk 
just /prevents/ a sane solution like that from ever getting done the 
right way.
  


Rusty, shouldn't this be a one-liner?  No need to involve the hypervisor 
here; the guest can s/syscall/int 80/ on its vdso page like it patches 
cli and its ilk.


--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
On Mon, Mar 05, 2007 at 01:06:31PM +0100, Ingo Molnar wrote:
 Subject: [patch] paravirt: VDSO page is essential
 From: Ingo Molnar [EMAIL PROTECTED]
 
 commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
 CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 

Well it was the change that made my test machine (with SUSE 9.0 userland)
work with CONFIG_PARAVIRT on. If you have a better solution please post
it.

 an essential component of Linux, and this change forces all of them to 
 use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
 handle the VDSO properly then it can work things around via the vdso=0 

No hypervisor involved, just CONFIG_PARAVIRT=y on bare hardware.

 boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
 case, it is a basic taste issue: we DO NOT #ifdef around core features 
 like this!)

We set sensible defaults for full backwards compatibility. 

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
 Can't paravirt patch the syscall instruction like it does the rest of 
 the kernel?

The problem is not the syscall instruction, but some problem with
the way the vDSO mapping is set up with CONFIG_PARAVIRT that broke older glibc.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Rusty Russell
On Mon, 2007-03-05 at 15:00 +0200, Avi Kivity wrote:
 Ingo Molnar wrote:
  * Avi Kivity [EMAIL PROTECTED] wrote:
  but yes, i agree that the hypervisor should have the ability to patch 
  the syscall instruction of both the hypervisor interface and of the VDSO 
  interface. But this wasnt implemented like that, and the #ifdef quirk 
  just /prevents/ a sane solution like that from ever getting done the 
  right way.

 
 Rusty, shouldn't this be a one-liner?  No need to involve the hypervisor 
 here; the guest can s/syscall/int 80/ on its vdso page like it patches 
 cli and its ilk.

Probably, but this is a red herring: see previous reply.  Andi was a
little overzealous w/ CONFIG_PARAVIRT  COMPAT_VDSO, that's all.

I've never thought of replacing the syscall insn.  I'll see if I can
come up with a good reason to want to 8)

Rusty.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 The problem is not the syscall instruction, but some problem with the 
 way the vDSO mapping is set up with CONFIG_PARAVIRT that broke older 
 glibc.

the problem is that CONFIG_PARAVIRT silently turns off 
CONFIG_COMPAT_VDSO, which of course breaks 'old' glibc. This too is a 
must-have fix for v2.6.21.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
On Mon, Mar 05, 2007 at 02:48:54PM +0100, Ingo Molnar wrote:
 
 * Andi Kleen [EMAIL PROTECTED] wrote:
 
  The problem is not the syscall instruction, but some problem with the 
  way the vDSO mapping is set up with CONFIG_PARAVIRT that broke older 
  glibc.
 
 the problem is that CONFIG_PARAVIRT silently turns off 
 CONFIG_COMPAT_VDSO, which of course breaks 'old' glibc. This too is a 
 must-have fix for v2.6.21.

We seem to have different definitions of must-have.

I think we would need to have a paravirt ops callback to decide this
first. But it doesn't look critical to me anyways.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 I think we would need to have a paravirt ops callback to decide this 
 first. But it doesn't look critical to me anyways.

well, it's critical to me in two ways: 1) to make the i386 paravirt code 
clean 2) to have a proper VDSO for a KVM paravirtual guest. The original 
change is also bad because it changes how a Linux guest behaves: it 
turns off the vdso by default, and disables the compat VDSO. I.e. it's a 
bad performance step backwards if CONFIG_PARAVIRT is enabled (uses int 
$0x80 instead of sysenter), which hurts only KVM and basically none of 
the other hypervisors. It also muddifies the VDSO picture wrt. 
virtualization.

i.e. it hurts the sane stuff and benefits the crappy stuff, and my 
change undoes that. That's enough for me to call it critical ;-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Avi Kivity

Ingo Molnar wrote:

* Andi Kleen [EMAIL PROTECTED] wrote:

  
I think we would need to have a paravirt ops callback to decide this 
first. But it doesn't look critical to me anyways.



well, it's critical to me in two ways: 1) to make the i386 paravirt code 
clean 2) to have a proper VDSO for a KVM paravirtual guest. The original 
change is also bad because it changes how a Linux guest behaves: it 
turns off the vdso by default, and disables the compat VDSO. I.e. it's a 
bad performance step backwards if CONFIG_PARAVIRT is enabled (uses int 
$0x80 instead of sysenter), which hurts only KVM and basically none of 
the other hypervisors. It also muddifies the VDSO picture wrt. 
virtualization.


  


kvm paravirt is only used by developers at this time.  I don't have a 
problem with solving this in the 2.6.22 timeframe.


--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Avi Kivity [EMAIL PROTECTED] wrote:

 kvm paravirt is only used by developers at this time.  I don't have a 
 problem with solving this in the 2.6.22 timeframe.

oh, certainly, with a KVM hat on i'd agree. But i also have my 
cares-about-i386-arch-cleanliness and cares-about-Linux-maintainability 
hats on ;-) I only mentioned KVM as an example that it /can/ be done 
cleanly.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Rusty Russell
On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:
 Subject: [patch] paravirt: VDSO page is essential
 From: Ingo Molnar [EMAIL PROTECTED]
 
 commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
 CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
 an essential component of Linux, and this change forces all of them to 
 use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
 handle the VDSO properly then it can work things around via the vdso=0 
 boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
 case, it is a basic taste issue: we DO NOT #ifdef around core features 
 like this!)

I agree with the criticism, dislike the snarly comments, and disagree
with this patch.

VDSO is only a problem if (1) the hypervisor wants to reserve the top
virtual address space (CONFIG_PARAVIRT=y), and (2) the glibc is old and
can't handle a VDSO mapped anywhere but 0xE000
(CONFIG_COMPAT_VDSO=y).

Now, KVM wants to use CONFIG_PARAVIRT=y but not reserve_top_address(),
so we should split the config option.  Let's not get too excited because
we kept it simple.  Patch (untested, but fairly simple) below.

BTW, I had a patch to do a runtime test (old glibc causes init to
assert, then disable vdso and try again): everyone hated it.

Signed-off-by: Rusty Russell [EMAIL PROTECTED]

diff -r f75715e64a3b arch/i386/Kconfig
--- a/arch/i386/Kconfig Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/Kconfig Tue Mar 06 00:20:44 2007 +1100
@@ -218,9 +218,18 @@ config PARAVIRT
  However, when run without a hypervisor the kernel is
  theoretically slower.  If in doubt, say N.
 
+config RESERVE_TOP
+   bool
+   help
+ Many hypervisors want to reserve some amount of the top of
+ virtual address space.  Unfortunately, old glibc needs the
+ vdso page there, so we must disable vdso if COMPAT_VDSO is
+ enabled as well as this option.
+
 config VMI
bool VMI Paravirt-ops support
depends on PARAVIRT  !NO_HZ
+   select RESERVE_TOP
default y
help
  VMI provides a paravirtualized interface to multiple hypervisors
@@ -893,9 +902,10 @@ config COMPAT_VDSO
 config COMPAT_VDSO
bool Compat VDSO support
default y
-   depends on !PARAVIRT
-   help
- Map the VDSO to the predictable old-style address too.
+   help
+ Map the VDSO to the predictable old-style address too, or
+ in the case of a VMI/Xen/lguest virtualized guest, don't create
+ the VDSO at all.
---help---
  Say N here if you are running a sufficiently recent glibc
  version (2.3.3 or later), to remove the high-mapped
diff -r f75715e64a3b arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c   Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/kernel/sysenter.c   Tue Mar 06 00:21:42 2007 +1100
@@ -27,7 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
+#if defined(CONFIG_COMPAT_VDSO)  defined(CONFIG_RESERVE_TOP)
 unsigned int __read_mostly vdso_enabled = 0;
 #else
 unsigned int __read_mostly vdso_enabled = 1;
diff -r f75715e64a3b arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.cTue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/mm/pgtable.cTue Mar 06 00:06:00 2007 +1100
@@ -173,7 +173,7 @@ void reserve_top_address(unsigned long r
BUG_ON(fixmaps  0);
printk(KERN_INFO Reserving virtual address space above 0x%08x\n,
   (int)-reserve);
-#ifdef CONFIG_COMPAT_VDSO
+#ifndef CONFIG_RESERVE_TOP
BUG_ON(reserve != 0);
 #else
__FIXADDR_TOP = -reserve - PAGE_SIZE;


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
 VDSO is only a problem if (1) the hypervisor wants to reserve the top
 virtual address space (CONFIG_PARAVIRT=y), and (2) the glibc is old and

It broke the boot even with native hardware, no hypervisor

  
 +config RESERVE_TOP
 + bool
 + help
 +   Many hypervisors want to reserve some amount of the top of
 +   virtual address space.  Unfortunately, old glibc needs the
 +   vdso page there, so we must disable vdso if COMPAT_VDSO is
 +   enabled as well as this option.

But this still means I would need to decide between a PARAVIRT
kernel that either supports xen/VMI or cannot boot old user land without
weird options.  I don't think that's the correct solution. The goal
is a single binary that runs everywhere and is still compatible.

What would probably work is to somehow decide at runtime if a hypervisor
is there or not and then set vdso default based on that. I guess that
detection would be hypervisor specific though and probably would
need paravirt ops extensions.

-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Rusty Russell [EMAIL PROTECTED] wrote:

 -#ifdef CONFIG_PARAVIRT
 +#if defined(CONFIG_COMPAT_VDSO)  defined(CONFIG_RESERVE_TOP)

NACK - my patch is quite a bit simpler and yours only increases the
#ifdef jungle. If there's any complication of the VDSO coming from some
other hypervisor support patch then I will judge that in full context, 
when it's submitted. Meanwhile, my patch is a must-have for v2.6.21.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

  VDSO is only a problem if (1) the hypervisor wants to reserve the 
  top virtual address space (CONFIG_PARAVIRT=y), and (2) the glibc is 
  old and
 
 It broke the boot even with native hardware, no hypervisor

yeah - i just sent the fix for that regression - by making 
CONFIG_COMPAT_VDSO usable again.

basically, we want to avoid all the compatibility mess that exists on 
the hypervisor side to enter the Linux kernel.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Roland McGrath
Does the old userland compatibility you're concerned about really need the
vdso to be at 0xfe000 in particular, or just need it to be at a fixed
address that matches the phdrs inside the image?  My recollection of the old
glibc's limitation was that it expected the image's phdrs to match its load
address.  The xen kernels used to change this to 0xd000 or something,
and AFAIK that was fine.  If that's all that's needed, it is not so hard to
adjust the vDSO contents at boot time (phdrs, shdrs, and symbols; no code
contents use the absolute address).  Under CONFIG_COMPAT_VDSO, it can see
where the paravirt moved the fixmap to, and apply adjustments.

That said, I don't think this is worth all that much effort since
CONFIG_COMPAT_VDSO is not really desireable for most people.  I think
disabling the vdso under CONFIG_COMPAT_VDSO+CONFIG_PARAVIRT is survivable
(just don't set CONFIG_COMPAT_VDSO for a system you want to be optimal).


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Jeremy Fitzhardinge
Roland McGrath wrote:
 Does the old userland compatibility you're concerned about really need the
 vdso to be at 0xfe000 in particular, or just need it to be at a fixed
 address that matches the phdrs inside the image?  My recollection of the old
 glibc's limitation was that it expected the image's phdrs to match its load
 address.  The xen kernels used to change this to 0xd000 or something,
 and AFAIK that was fine.  If that's all that's needed, it is not so hard to
 adjust the vDSO contents at boot time (phdrs, shdrs, and symbols; no code
 contents use the absolute address).  Under CONFIG_COMPAT_VDSO, it can see
 where the paravirt moved the fixmap to, and apply adjustments.
   

Jan Beulich just posted a patch to do just this - relocate the vdso's
ELF header.  If that's all that's really required to keep COMPAT_VDSO
viable under PARAVIRT, then it seems like the way to go.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Roland McGrath
 Jan Beulich just posted a patch to do just this - relocate the vdso's
 ELF header.  If that's all that's really required to keep COMPAT_VDSO
 viable under PARAVIRT, then it seems like the way to go.

I found http://marc.theaimsgroup.com/?l=xen-develm=117309332600075w=2 and
that must be the one you meant.  The ELF-grokking form of that is exactly
what I had in mind.  The find relocs with cmp scheme is pretty silly, but
also works fine.  It trades tweaky ELF knowledge with tweaky fragile build
methods, but it's all about the same to me.  


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Jeremy Fitzhardinge
Roland McGrath wrote:
 Jan Beulich just posted a patch to do just this - relocate the vdso's
 ELF header.  If that's all that's really required to keep COMPAT_VDSO
 viable under PARAVIRT, then it seems like the way to go.
 

 I found http://marc.theaimsgroup.com/?l=xen-develm=117309332600075w=2 and
 that must be the one you meant.  The ELF-grokking form of that is exactly
 what I had in mind.  The find relocs with cmp scheme is pretty silly, but
 also works fine.  It trades tweaky ELF knowledge with tweaky fragile build
 methods, but it's all about the same to me.  
   

That's the one.  I think the C version is the one to go with.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Andi Kleen wrote:

But this still means I would need to decide between a PARAVIRT
kernel that either supports xen/VMI or cannot boot old user land without
weird options.  I don't think that's the correct solution. The goal
is a single binary that runs everywhere and is still compatible.
  


Rusty's patch looks a step in the right direction to me.  I can't 
comment on Ingo's patch as he failed to -cc me on it, despite the fact 
that we are the only ones affected by it.  We are not concerned so much 
with supporting legacy user land deployments, as we expect for the most 
part, the install scenario to happen when upgrading to new distros.



What would probably work is to somehow decide at runtime if a hypervisor
is there or not and then set vdso default based on that. I guess that
detection would be hypervisor specific though and probably would
need paravirt ops extensions.
  


What we really need to do is to be able to detect an old user land and 
drop VDSO support when that is found.  But since we can't do that, the 
next best thing is to allow the hypervisor to choose whatever workaround 
it wants when it moves the fixmap and compat_vdso was enabled.  In our 
case, the workaround we will want is a boot option to disable VDSO for 
old user land, and a printk warning if you take #GPs and kill the init 
proc, because for us, this is not an expected support scenario.  We 
would much rather support the VDSO by default in paravirt kernels even 
with COMPAT_VDSO turned on.


Zach
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Andi Kleen
 We are not concerned so much 
 with supporting legacy user land deployments,

I am concerned about that. I won't merge any patches that break compatibility
by default.

  What would probably work is to somehow decide at runtime if a hypervisor
  is there or not and then set vdso default based on that. I guess that
  detection would be hypervisor specific though and probably would
  need paravirt ops extensions.

 
 What we really need to do is to be able to detect an old user land and 
 drop VDSO support when that is found.  

Rusty implemented that, but it was widely considered too ugly
(and it was not 100% reliable e.g. with chroots) 

 But since we can't do that, the  
 next best thing is to allow the hypervisor to choose whatever workaround 
 it wants when it moves the fixmap and compat_vdso was enabled.  In our 
 case, the workaround we will want is a boot option to disable VDSO for 
 old user land,

The boot option is already there, but boot options for is not my
idea of user friendly binary compatibility.

 and a printk warning if you take #GPs and kill the init  
 proc, because for us, this is not an expected support scenario.  We 
 would much rather support the VDSO by default in paravirt kernels even 
 with COMPAT_VDSO turned on.

But you can't have it at the compatible fixed address, right?

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Zachary Amsden [EMAIL PROTECTED] wrote:

 [...]  I can't comment on Ingo's patch as he failed to -cc me on it,

it's on lkml, i'll bounce it to you separately.

 despite the fact that we are the only ones affected by it. [...]

actually, no, my motivation for it was KVM (a Linux based hypervisor and 
its paravirtual interface). You said that VMI wont hinder Linux 
development and design in any way and can adopt to whatever change the 
upstream kernel does, so i'm taking your word for that.

 [...] We are not concerned so much with supporting legacy user land 
 deployments, as we expect for the most part, the install scenario to 
 happen when upgrading to new distros.

that's not really the right argument. This obviously affects the host 
kernel too if it has CONFIG_PARAVIRT enabled. The other option is the 
removal (or temporary disabling) of the whole CONFIG_VMI and 
CONFIG_PARAVIRT stuff if you cannot get into minimal release shape, it's 
experimental stuff after all.

 What we really need to do is to be able to detect an old user land and 
 drop VDSO support when that is found.  [...]

no. What you need is to keep existing mechanisms and not hack off the 
vdso and CONFIG_COMPAT_VDSO...

 [...] But since we can't do that, the next best thing is to allow the 
 hypervisor to choose whatever workaround it wants when it moves the 
 fixmap and compat_vdso was enabled.  In our case, the workaround we 
 will want is a boot option to disable VDSO for old user land, [...]

there's no need to disable the VDSO for old userspace ...

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Andi Kleen wrote:

The boot option is already there, but boot options for is not my
idea of user friendly binary compatibility.
  


Yes, not friendly.  Perhaps we can reverse the boot option?  I.e make 
vdso_enable=force re-activate the vdso even if it gets moved by a 
hypervisor and COMPAT_VDSO was compiled in.


But how many actual users are affected by these old glibc's that we need 
to care so much about them running new technology, especially if that 
technology can be built with the ability to warn them that they probably 
need to boot with vdso=disabled?



and a printk warning if you take #GPs and kill the init  
proc, because for us, this is not an expected support scenario.  We 
would much rather support the VDSO by default in paravirt kernels even 
with COMPAT_VDSO turned on.



But you can't have it at the compatible fixed address, right?
  


While technically possible, is is not possible to do that anytime soon, 
and the drawbacks might overshadow the performance gain of the VDSO - 
and I don't think any of the other hypervisors (lhype, Xen) will be able 
to do that either.  KVM can do it, only because it relies on hardware virt.


Zach
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Ingo Molnar wrote:

there's no need to disable the VDSO for old userspace ...
  


Well, apart from the obvious question to which nobody actually knows the 
answer, (how many people run old user space that required 
CONFIG_COMPAT_VDSO), what do you think of reversing the boot option?


vdso=enabled (default - turn on VDSO on normal boots)
vdso=disabled (turn off VDSO unconditionally)
[vdso=compat] (default for COMPAT_VDSO - keep VDSO only when mapped at 
compat location.  Note the option is not required to be implemented 
because it is logically implied from vdso=enabled  COMPAT_VDSO and the 
default boot behavior)
vdso=force (keep VDSO even when moved to a new location and COMPAT_VDSO 
is enabled).


In our case, installing VMware tools in the guest would then detect if 
userspace supports VDSO or if it requires COMPAT_VDSO and would then set 
boot parameters for the kernel appropriately.  And the native boot and 
kvm paravirt-ops boot are completely unaffected.


Zach
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Rusty Russell
On Tue, 2007-03-06 at 00:28 +1100, Rusty Russell wrote:
 On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:
  Subject: [patch] paravirt: VDSO page is essential
  From: Ingo Molnar [EMAIL PROTECTED]
  
  commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
  CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
  an essential component of Linux, and this change forces all of them to 
  use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
  handle the VDSO properly then it can work things around via the vdso=0 
  boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
  case, it is a basic taste issue: we DO NOT #ifdef around core features 
  like this!)
 
 I agree with the criticism, dislike the snarly comments, and disagree
 with this patch.

And my patch was pretty crack-induced too.  Sorry.

I shouldn't have been thinking about using CONFIG options at all: we
should simply disable the vdso if CONFIG_COMPAT_VDSO=y when we
*actually* reserve top memory.

This still need some work (doing that now), but do people like the idea?

The current vdso_disabled flag merely disabled the ELF note, so it
needs to be made a little stronger, to not set up the vdso at all.

diff -r f75715e64a3b arch/i386/Kconfig
--- a/arch/i386/Kconfig Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/Kconfig Tue Mar 06 09:30:36 2007 +1100
@@ -893,9 +893,10 @@ config COMPAT_VDSO
 config COMPAT_VDSO
bool Compat VDSO support
default y
-   depends on !PARAVIRT
-   help
- Map the VDSO to the predictable old-style address too.
+   help
+ Map the VDSO to the predictable old-style address too, or
+ in the case of a VMI/Xen/lguest virtualized guest, don't create
+ the VDSO at all.
---help---
  Say N here if you are running a sufficiently recent glibc
  version (2.3.3 or later), to remove the high-mapped
diff -r f75715e64a3b arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c   Tue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/kernel/sysenter.c   Tue Mar 06 09:25:47 2007 +1100
@@ -27,11 +27,7 @@
  * Should the kernel map a VDSO page into processes and pass its
  * address down to glibc upon exec()?
  */
-#ifdef CONFIG_PARAVIRT
-unsigned int __read_mostly vdso_enabled = 0;
-#else
 unsigned int __read_mostly vdso_enabled = 1;
-#endif
 
 EXPORT_SYMBOL_GPL(vdso_enabled);
 
@@ -51,7 +47,7 @@ void enable_sep_cpu(void)
int cpu = get_cpu();
struct tss_struct *tss = per_cpu(init_tss, cpu);
 
-   if (!boot_cpu_has(X86_FEATURE_SEP)) {
+   if (!boot_cpu_has(X86_FEATURE_SEP) || !vdso_enabled) {
put_cpu();
return;
}
@@ -74,7 +70,12 @@ static struct page *syscall_pages[1];
 
 int __init sysenter_setup(void)
 {
-   void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
+   void *syscall_page;
+
+   if (!vdso_enabled)
+   return 0;
+
+   syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
syscall_pages[0] = virt_to_page(syscall_page);
 
 #ifdef CONFIG_COMPAT_VDSO
@@ -106,6 +107,9 @@ int arch_setup_additional_pages(struct l
struct mm_struct *mm = current-mm;
unsigned long addr;
int ret;
+
+   if (!vdso_enabled)
+   return 0;
 
down_write(mm-mmap_sem);
addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
diff -r f75715e64a3b arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.cTue Mar 06 00:04:50 2007 +1100
+++ b/arch/i386/mm/pgtable.cTue Mar 06 09:32:51 2007 +1100
@@ -144,10 +144,8 @@ void set_pmd_pfn(unsigned long vaddr, un
 }
 
 static int fixmaps;
-#ifndef CONFIG_COMPAT_VDSO
 unsigned long __FIXADDR_TOP = 0xf000;
 EXPORT_SYMBOL(__FIXADDR_TOP);
-#endif
 
 void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t 
flags)
 {
@@ -174,11 +172,12 @@ void reserve_top_address(unsigned long r
printk(KERN_INFO Reserving virtual address space above 0x%08x\n,
   (int)-reserve);
 #ifdef CONFIG_COMPAT_VDSO
-   BUG_ON(reserve != 0);
-#else
+   /* We can't have both reserved space and VDSO at 0xE000. */
+   if (reserve)
+   vdso_enabled = 0;
+#endif
__FIXADDR_TOP = -reserve - PAGE_SIZE;
__VMALLOC_RESERVE += reserve;
-#endif
 }
 
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
diff -r f75715e64a3b include/asm-i386/fixmap.h
--- a/include/asm-i386/fixmap.h Tue Mar 06 00:04:50 2007 +1100
+++ b/include/asm-i386/fixmap.h Tue Mar 06 09:29:15 2007 +1100
@@ -19,10 +19,8 @@
  * Leave one empty page between vmalloc'ed areas and
  * the start of the fixmap.
  */
-#ifndef CONFIG_COMPAT_VDSO
 extern unsigned long __FIXADDR_TOP;
-#else
-#define __FIXADDR_TOP  0xf000
+#ifdef CONFIG_COMPAT_VDSO
 #define FIXADDR_USER_START __fix_to_virt(FIX_VDSO)
 #define FIXADDR_USER_END   __fix_to_virt(FIX_VDSO - 1)
 #endif


-
To unsubscribe from this list

Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Rusty Russell wrote:

On Tue, 2007-03-06 at 00:28 +1100, Rusty Russell wrote:
  

On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:


Subject: [patch] paravirt: VDSO page is essential
From: Ingo Molnar [EMAIL PROTECTED]

commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO for 
CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the VDSO is 
an essential component of Linux, and this change forces all of them to 
use int $0x80 - including sane ones like KVM. (If a hypervisor does not 
handle the VDSO properly then it can work things around via the vdso=0 
boot option. Or CONFIG_PARAVIRT should not have been merged. But in any 
case, it is a basic taste issue: we DO NOT #ifdef around core features 
like this!)
  

I agree with the criticism, dislike the snarly comments, and disagree
with this patch.



And my patch was pretty crack-induced too.  Sorry.

I shouldn't have been thinking about using CONFIG options at all: we
should simply disable the vdso if CONFIG_COMPAT_VDSO=y when we
*actually* reserve top memory.

This still need some work (doing that now), but do people like the idea?

The current vdso_disabled flag merely disabled the ELF note, so it
needs to be made a little stronger, to not set up the vdso at all.
  


I had just sent this out for internal review...


COMPAT_VDSO is incompatible with PARAVIRT for most implementations, as they
must relocate the fixmap to make room for a hypervisor.  So allow COMPAT_VDSO
kernels to relocate the fixmap as well, just disable the VDSO if they do so.

Signed-off-by: Zachary Amsden [EMAIL PROTECTED]

diff -r fad0910252d2 arch/i386/kernel/sysenter.c
--- a/arch/i386/kernel/sysenter.c   Mon Mar 05 15:24:04 2007 -0800
+++ b/arch/i386/kernel/sysenter.c   Mon Mar 05 15:27:31 2007 -0800
@@ -74,7 +74,12 @@ static struct page *syscall_pages[1];
 
 int __init sysenter_setup(void)
 {
-   void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
+   void *syscall_page;
+
+   if (!vdso_enabled)
+   return 0;
+
+   syscall_page = (void *)get_zeroed_page(GFP_ATOMIC);
syscall_pages[0] = virt_to_page(syscall_page);
 
 #ifdef CONFIG_COMPAT_VDSO
@@ -106,6 +111,11 @@ int arch_setup_additional_pages(struct l
struct mm_struct *mm = current-mm;
unsigned long addr;
int ret;
+
+   if (!vdso_enabled) {
+   current-mm-context.vdso = (void *)~0UL;
+   return 0;
+   }
 
down_write(mm-mmap_sem);
addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
diff -r fad0910252d2 arch/i386/mm/pgtable.c
--- a/arch/i386/mm/pgtable.cMon Mar 05 15:24:04 2007 -0800
+++ b/arch/i386/mm/pgtable.cMon Mar 05 16:06:31 2007 -0800
@@ -144,10 +144,8 @@ void set_pmd_pfn(unsigned long vaddr, un
 }
 
 static int fixmaps;
-#ifndef CONFIG_COMPAT_VDSO
 unsigned long __FIXADDR_TOP = 0xf000;
 EXPORT_SYMBOL(__FIXADDR_TOP);
-#endif
 
 void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t 
flags)
 {
@@ -174,11 +172,13 @@ void reserve_top_address(unsigned long r
printk(KERN_INFO Reserving virtual address space above 0x%08x\n,
   (int)-reserve);
 #ifdef CONFIG_COMPAT_VDSO
-   BUG_ON(reserve != 0);
-#else
+   if (reserve != 0) {
+   printk(KERN_WARNING Compat VDSO is incompatible with fixmap 
relocation - disabling VDSO\n);
+   vdso_enabled = 0;
+   }
+#endif
__FIXADDR_TOP = -reserve - PAGE_SIZE;
__VMALLOC_RESERVE += reserve;
-#endif
 }
 
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
diff -r fad0910252d2 include/asm-i386/elf.h
--- a/include/asm-i386/elf.hMon Mar 05 15:24:04 2007 -0800
+++ b/include/asm-i386/elf.hMon Mar 05 15:44:43 2007 -0800
@@ -137,7 +137,7 @@ extern int dump_task_extended_fpu (struc
 
 #ifdef CONFIG_COMPAT_VDSO
 # define VDSO_COMPAT_BASE  VDSO_HIGH_BASE
-# define VDSO_PRELINK  VDSO_HIGH_BASE
+# define VDSO_PRELINK  0xe000UL
 #else
 # define VDSO_COMPAT_BASE  VDSO_BASE
 # define VDSO_PRELINK  0
diff -r fad0910252d2 include/asm-i386/fixmap.h
--- a/include/asm-i386/fixmap.h Mon Mar 05 15:24:04 2007 -0800
+++ b/include/asm-i386/fixmap.h Mon Mar 05 15:59:30 2007 -0800
@@ -14,19 +14,6 @@
 #define _ASM_FIXMAP_H
 
 
-/* used by vmalloc.c, vsyscall.lds.S.
- *
- * Leave one empty page between vmalloc'ed areas and
- * the start of the fixmap.
- */
-#ifndef CONFIG_COMPAT_VDSO
-extern unsigned long __FIXADDR_TOP;
-#else
-#define __FIXADDR_TOP  0xf000
-#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO)
-#define FIXADDR_USER_END   __fix_to_virt(FIX_VDSO - 1)
-#endif
-
 #ifndef __ASSEMBLY__
 #include linux/kernel.h
 #include asm/acpi.h
@@ -35,6 +22,15 @@ extern unsigned long __FIXADDR_TOP;
 #ifdef CONFIG_HIGHMEM
 #include linux/threads.h
 #include asm/kmap_types.h
+#endif
+
+/* used by vmalloc.c, vsyscall.lds.S, elf.h, pgtable.c */
+extern unsigned long __FIXADDR_TOP;
+
+/* used for dumping

Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Rusty Russell
On Mon, 2007-03-05 at 17:03 -0800, Zachary Amsden wrote:
 Rusty Russell wrote:
  This still need some work (doing that now), but do people like the idea?
 
 I had just sent this out for internal review...

Spooky!

I was just testing with lguest, but I'll do so with your patch instead.

Thanks,
Rusty.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
 Rusty Russell wrote:
 On Tue, 2007-03-06 at 00:28 +1100, Rusty Russell wrote:
  
 On Mon, 2007-03-05 at 13:06 +0100, Ingo Molnar wrote:

 Subject: [patch] paravirt: VDSO page is essential
 From: Ingo Molnar [EMAIL PROTECTED]

 commit 3bbf54725467d604698721384d858b5983b87e8f disables the VDSO
 for CONFIG_PARAVIRT kernels. This #ifdeffery was a bad change: the
 VDSO is an essential component of Linux, and this change forces all
 of them to use int $0x80 - including sane ones like KVM. (If a
 hypervisor does not handle the VDSO properly then it can work
 things around via the vdso=0 boot option. Or CONFIG_PARAVIRT should
 not have been merged. But in any case, it is a basic taste issue:
 we DO NOT #ifdef around core features like this!)
   
 I agree with the criticism, dislike the snarly comments, and disagree
 with this patch.
 

 And my patch was pretty crack-induced too.  Sorry.

 I shouldn't have been thinking about using CONFIG options at all: we
 should simply disable the vdso if CONFIG_COMPAT_VDSO=y when we
 *actually* reserve top memory.

 This still need some work (doing that now), but do people like the idea?

 The current vdso_disabled flag merely disabled the ELF note, so it
 needs to be made a little stronger, to not set up the vdso at all.
   

 I had just sent this out for internal review...


I think Jan's approach is better if it works (since there's no
compromise), but this is better if you want to get something working in
the near term.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

I think Jan's approach is better if it works (since there's no
compromise), but this is better if you want to get something working in
the near term.
  


Is Jan dynamically relinking the vdso at runtime?  Because that is what 
we will need.  Adding support for COMPAT_VDSO creates a major 
performance problem for us.  We need a way to force the VDSO to be 
enabled with a boot parameter.


Zach
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
 Jeremy Fitzhardinge wrote:
 I think Jan's approach is better if it works (since there's no
 compromise), but this is better if you want to get something working in
 the near term.
   

 Is Jan dynamically relinking the vdso at runtime?  
Yes.  His patch adds a runtime relocation for the vdso.
http://marc.theaimsgroup.com/?l=xen-develm=117309332600075w=2

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:

  I had just sent this out for internal review...
 
 I think Jan's approach is better if it works (since there's no 
 compromise), but this is better if you want to get something working 
 in the near term.

yeah. (plus my patches of course that remove the current papering-over 
hackery and restores COMPAT_VDSO.)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Zachary Amsden

Ingo Molnar wrote:

* Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:

  

I had just sent this out for internal review...
  
I think Jan's approach is better if it works (since there's no 
compromise), but this is better if you want to get something working 
in the near term.



yeah. (plus my patches of course that remove the current papering-over 
hackery and restores COMPAT_VDSO.)
  


Yes, I don't have a problem with your patch, I just wish I had been cc'd 
on it.  Fixing this is rather tricky, but I believe no strange build 
magic is required, it can be done in kernel init code.  Still building 
my SUSE 9.0 guest to test.  SUSE 9.0 is one of those that requires 
COMPAT_VDSO, yes?


Thanks,

Zach
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] paravirt: VDSO page is essential

2007-03-05 Thread Ingo Molnar

* Zachary Amsden [EMAIL PROTECTED] wrote:

  yeah. (plus my patches of course that remove the current 
  papering-over hackery and restores COMPAT_VDSO.)
 
 Yes, I don't have a problem with your patch, I just wish I had been 
 cc'd on it. [...]

(i Cc:-ed you to the other ones - i simply forgot and bounced it to you 
a few hours down the line - sorry!)

 [...] Fixing this is rather tricky, but I believe no strange build 
 magic is required, it can be done in kernel init code.  Still building 
 my SUSE 9.0 guest to test.  SUSE 9.0 is one of those that requires 
 COMPAT_VDSO, yes?

yeah, and a handful of other ones. It depends on the glibc version: 
early vdso glibs were buggy and assumed a few things about the vdso, so 
they would segfault on the new-style vdso which is fully relocatable 
(and hence mergable into the vma space, randomizable, etc.).

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/