CVS commit: src/sys/arch/xen/x86

2019-06-07 Thread Cherry G. Mathew
Module Name:src
Committed By:   cherry
Date:   Fri Jun  7 12:43:52 UTC 2019

Modified Files:
src/sys/arch/xen/x86: xen_intr.c

Log Message:
Fix build for the XEN3_PVHVM kernel by conditoinally compiling
redundant functions x86_enable_intr()/x86_disable_intr()


To generate a diff of this commit:
cvs rdiff -u -r1.16 -r1.17 src/sys/arch/xen/x86/xen_intr.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/sys/arch/xen/x86/xen_intr.c
diff -u src/sys/arch/xen/x86/xen_intr.c:1.16 src/sys/arch/xen/x86/xen_intr.c:1.17
--- src/sys/arch/xen/x86/xen_intr.c:1.16	Thu May  9 17:09:51 2019
+++ src/sys/arch/xen/x86/xen_intr.c	Fri Jun  7 12:43:52 2019
@@ -1,4 +1,4 @@
-/*	$NetBSD: xen_intr.c,v 1.16 2019/05/09 17:09:51 bouyer Exp $	*/
+/*	$NetBSD: xen_intr.c,v 1.17 2019/06/07 12:43:52 cherry Exp $	*/
 
 /*-
  * Copyright (c) 1998, 2001 The NetBSD Foundation, Inc.
@@ -30,7 +30,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: xen_intr.c,v 1.16 2019/05/09 17:09:51 bouyer Exp $");
+__KERNEL_RCSID(0, "$NetBSD: xen_intr.c,v 1.17 2019/06/07 12:43:52 cherry Exp $");
 
 #include 
 #include 
@@ -99,6 +99,7 @@ xen_spllower(int nlevel)
 }
 
 
+#if !defined(XENPVHVM)
 void
 x86_disable_intr(void)
 {
@@ -117,6 +118,8 @@ x86_enable_intr(void)
 		hypervisor_force_callback();
 }
 
+#endif /* !XENPVHVM */
+
 u_long
 xen_read_psl(void)
 {



CVS commit: src/sys/arch/xen/x86

2019-06-07 Thread Cherry G. Mathew
Module Name:src
Committed By:   cherry
Date:   Fri Jun  7 12:43:52 UTC 2019

Modified Files:
src/sys/arch/xen/x86: xen_intr.c

Log Message:
Fix build for the XEN3_PVHVM kernel by conditoinally compiling
redundant functions x86_enable_intr()/x86_disable_intr()


To generate a diff of this commit:
cvs rdiff -u -r1.16 -r1.17 src/sys/arch/xen/x86/xen_intr.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.



Re: CVS commit: src/sys/arch/x86

2019-02-13 Thread Cherry G . Mathew
(resent to source-changes-d@)

"Maxime Villard"  writes:


[...]

>
> Contrary to AMD-SVM, Intel-VMX uses a different set of PTE bits from
> native, and this has three important consequences:
>
>  - We can't use the native PTE bits, so each time we want to modify the
>page tables, we need to know whether we're dealing with a native pmap
>or an EPT pmap. This is accomplished with callbacks, that handle
>everything PTE-related.
>

I like this approach - perhaps it's a way to separate out other similar
pmaps (OT).

>  - There is no recursive slot possible, so we can't use pmap_map_ptes().
>Rather, we walk down the EPT trees via the direct map, and that's
>actually a lot simpler (and probably faster too...).
>

Does this mean that nvmm hosts have to have __HAVE_DIRECT_MAP ?
If so, it might be worth having a separate kernel conf rather than
GENERIC (I don't know how this works now). I recently built an amd64
kernel without __HAVE_DIRECT_MAP and was quite impressed that it
actually booted. That's a nice to have feature.

>  - The kernel is never mapped in an EPT pmap. An EPT pmap cannot be loaded
>on the host. This has two sub-consequences: at creation time we must
>zero out all of the top-level PTEs, and at destruction time we force
>the page out of the pool cache and into the pool, to ensure that a next
>allocation will invoke pmap_pdp_ctor() to create a native pmap and not
>recycle some stale EPT entries.

Can you not use a separate poolcache ? This could isolate host/guest
related memory pressure as well ?

-- 
~cherry


Re: CVS commit: src/sys/arch/x86/x86

2019-01-06 Thread Cherry G . Mathew
Maxime Villard  writes:

> Can we do something about it now? It's been more than a week, and the issue is
> still there. NVMM still doesn't modload, same for procfs, and GENERIC_KASLR
> doesn't work either.
>
> This needs to be fixed now, and we should not start adding random hacks all
> over the place (like the one below).
>

Sorry for the delay - I've rolled back the changes.

http://mail-index.netbsd.org/source-changes/2019/01/06/msg102113.html

The XEN related ones I will roll back if enough people complain. I'm
meanwhile investigating other options.

Thanks for your patience.
-- 
~cherry


Re: CVS commit: src/sys/arch

2018-12-30 Thread Cherry G . Mathew
Christoph Badura  writes:

> On Tue, Dec 25, 2018 at 11:45:42AM +0530, Cherry G.Mathew wrote:
>> Joerg Sonnenberger  writes:
>> > On Sat, Dec 22, 2018 at 09:27:22PM +, Cherry G. Mathew wrote:
>> >> Modified Files:
>> >>   src/sys/arch/amd64/amd64: cpufunc.S
>> >>   src/sys/arch/i386/i386: cpufunc.S i386func.S
>> >>   src/sys/arch/xen/x86: xenfunc.c
>> >> Log Message:
>> >> Introduce a weak alias method of exporting different implementations
>> >> of the same API.
>> > Why? As in: what's the advantage of making them weak.
>> I'd posted separately before this commit explaining the rationale.
>
> It took me a while to check the most obvious places and figure out which
> message it might have been.  A more precise reference would have helped.
>
>> Here's the scenario above:
>> 
>> let's take lcr3(). On native this is a ring3 operation, implemented in
>> assembler. On xen, this is a PV call. Currently there's no need to have
>> both implementations in a single binary, since PV and native binaries
>> are distinct. With PVHVM, we have a situation where at boot time, the
>> native version is required, and after XEN is detected, we can use the
>> XEN version of the call. I've taken the approach of naming the
>> individual functions distinctly, eg: i386_lcr3(), or xen_lcr3(), while
>> and exporting them weakly as the consumed version, ie; lcr3().
>> 
>> What happens is that when the individual binary is compiled, the
>> appropriate weakly exported symbol takes over, and things work as
>> expected. When  the combined binary for PVHVM is required, we write a
>> strongly exported "switch" function, called lcr3() (I haven't committed
>> this yet) which overrides both the weak symbols. It can then do the
>> runtime calls by calling into the appropriate i386_lcr3() or xen_lcr3(),
>> depending on the mode in which it is running.
>
> I don't find this argument for weak aliases convincing.  You plan to
> write the function that makes a runtime decision between the
> implemenation anyway.  You might as well write that function now and avoid
> another bit of hidden magic.
>

The other options have been suggested earlier (function pointers and
hotpatching).

Function pointers require reprototyping every function required (I don't
have an exhaustive list yet). Hotpatching isn't useful in that it will
overwrite the default version, so we'll have to hotpatch twice. Once to
override the default, and then to make a copy so that the default is
available. It's also ugly, so for me this is the least preferable
method.

> You can have multiple versions of that "switch" function and select the
> appropriate one with config(8) logic.  

It's too late for that. Things like lcr3() need to work way before
config(8) is a twinkle in boot(8)s eyes.

> Or you can select with #ifdef.  Somewhere you have to place the logic
> for conditional compilation/linking.  Having that logic concentrated
> in a single place instead of N seems preferable to me.

Yes, it's pretty intense - see x86/mainbus.c:mainbus_attach() for a
sample of what is to come.

-- 
~cherry


Re: Weak x86 aliases

2018-12-28 Thread Cherry G . Mathew
Maxime Villard  writes:

> Le 28/12/2018 à 15:06, Cherry G.Mathew a écrit :
>> Maxime Villard  writes:
>>> Le 28/12/2018 à 14:57, Cherry G.Mathew a écrit :
 Maxime Villard  writes:
>> Introduce a weak alias method of exporting different implementations
>> of the same API.
>
> Please revert or fix this change.

 I'm not sure what the fix is - do you have a suggestion ?
>>>
>>> either re-apply it without using weak symbols, or change the modloader
>>> to accept weak symbols
>>
>> I don't like the imperative in your tone. NVMM is the user of modloader,
>> not PVHVM. So if you feel like your usecase needs fixing, I'd say it's
>> your problem - or don't use modules, but see below.
>
> Wut? Yes my suggestions are either we re-apply the change without using
> weak symbols or we change the modloader to accept weak symbols.
>

Would a __strong_alias() fix this for you ? ie; is aliasing problematic
in itself ?

-- 
~cherry


Re: Weak x86 aliases

2018-12-28 Thread Cherry G . Mathew
John Nemeth  writes:

> On Dec 28, 11:52pm, "Mathew, Cherry G." wrote:
> } On December 28, 2018 9:54:11 PM GMT+05:30, John Nemeth  
> wrote:
> } >On Dec 28,  7:36pm, "Cherry G.Mathew" wrote:
> } >} Maxime Villard  writes:
> } >} > Le 28/12/2018 �� 14:57, Cherry G.Mathew a ��crit :
> } >} >> Maxime Villard  writes:
> } >}  Introduce a weak alias method of exporting different
> } >implementations
> } >}  of the same API.
> } >} >>>
> } >} >>> Please revert or fix this change.
> } >} >>
> } >} >> I'm not sure what the fix is - do you have a suggestion ?
> } >} >
> } >} > either re-apply it without using weak symbols, or change the
> } >modloader
> } >} > to accept weak symbols
> } >} 
> } >} I don't like the imperative in your tone. NVMM is the user of
> } >modloader,
> } >} not PVHVM. So if you feel like your usecase needs fixing, I'd say
> } >it's
> } >} your problem - or don't use modules, but see below.
> } >
> } > I suspect there's a language issue here due to people using
> } >English as a second language.  However, I don't see an imperative
> } >(command) here.  You asked for suggestions on how to fix a problem.
> } >He answered your question with a couple of suggestions.  That's
> } >all.
> } >
> } > Also, I would argue that the kernel uses modloader, not the
> } >module.  In any event, as mentioned, it is your change that broke
> } >things...
> } 
> } Based on Jason's reply I suspect I've broken modules on Xen too.
> } ISTR that you did some work in this area. If you did, can you
> } comment?
>
>  I mostly worked on the infrastructure to build a seperate set
> of modules for xen and xen-pae.  On amd64, Xen modules didn't work
> due to missing modmap, which maxv fixed.  On i386, only simple
> modules worked due to not being able to find symbols.  I really
> don't know a whole lot about linkers and loaders (manipulating
> makefiles is much simpler).
>
>  However, having said that, I suspect that your work with PVHVM
> (and presumably PVH after that) will make the idea of having seperate
> modules for Xen obsolete.  I.e. it appears to me that we're headed
> to a world where a single kernel will work both on raw iron and
> under Xen.
>

I would like to maintain PV as is. It has advantages that the PVHVM
stuff doesn't have.

-- 
~cherry


Re: Weak x86 aliases

2018-12-28 Thread Cherry G . Mathew
Cherry G. Mathew  writes:


[...]

> I think I'll revert these for now, 

PS: I'm in transit and probably not the best time to do this - if you
feel like it's urgent enough, please feel free to rollback. I'll fix
breakage on my end once the dust settles. I'll be in a commitable place
in ~48hrs.

-- 
~cherry


Re: Weak x86 aliases

2018-12-28 Thread Cherry G . Mathew
Maxime Villard  writes:

> Le 28/12/2018 à 14:57, Cherry G.Mathew a écrit :
>> Maxime Villard  writes:
 Introduce a weak alias method of exporting different implementations
 of the same API.
>>>
>>> Please revert or fix this change.
>>
>> I'm not sure what the fix is - do you have a suggestion ?
>
> either re-apply it without using weak symbols, or change the modloader
> to accept weak symbols
>

I don't like the imperative in your tone. NVMM is the user of modloader,
not PVHVM. So if you feel like your usecase needs fixing, I'd say it's
your problem - or don't use modules, but see below.

>>> The kernel modules that use these functions can't be modloaded
>>> anymore, because weak symbols aren't resolved.
>>>
>>> Eg, NVMM can't be modloaded anymore, because it uses rcr0 among others.
>>>
>>
>> I think I'll revert these for now, because PVHVM doesn't/shouldn't use
>> them anyway, but I'd like to know how to fix this properly. modload not
>> working due to __weak_alias() sounds like something we don't do properly
>> in the modload path.
>
> There is a check in kobj_sym_lookup():
>
> 926   case STB_WEAK:
> 927   kobj_error(ko, "weak symbols not supported");
> 928   return EINVAL;
>
> To resolve correctly I guess we need to take the strongest of the
> duplicated symbols.
>

I'll look into this unless someone else beats me to it - but for now I'm
ok for the specific change that breaks things for NVMM to be rolled
back.

(Please see other email about timeframe).


-- 
~cherry


Re: Weak x86 aliases

2018-12-28 Thread Cherry G . Mathew
Maxime Villard  writes:

>> Introduce a weak alias method of exporting different implementations
>> of the same API.
>
> Please revert or fix this change. 

I'm not sure what the fix is - do you have a suggestion ?

> The kernel modules that use these functions can't be modloaded
> anymore, because weak symbols aren't resolved.
>
> Eg, NVMM can't be modloaded anymore, because it uses rcr0 among others.
>

I think I'll revert these for now, because PVHVM doesn't/shouldn't use
them anyway, but I'd like to know how to fix this properly. modload not
working due to __weak_alias() sounds like something we don't do properly
in the modload path.

-- 
~cherry


Re: CVS commit: src/sys/arch

2018-12-24 Thread Cherry G . Mathew
Joerg Sonnenberger  writes:

> On Sat, Dec 22, 2018 at 09:27:22PM +0000, Cherry G. Mathew wrote:
>> Module Name: src
>> Committed By:cherry
>> Date:Sat Dec 22 21:27:22 UTC 2018
>> 
>> Modified Files:
>>  src/sys/arch/amd64/amd64: cpufunc.S
>>  src/sys/arch/i386/i386: cpufunc.S i386func.S
>>  src/sys/arch/xen/x86: xenfunc.c
>> 
>> Log Message:
>> Introduce a weak alias method of exporting different implementations
>> of the same API.
>
> Why? As in: what's the advantage of making them weak.
>

I'd posted separately before this commit explaining the rationale.

Here's the scenario above:

let's take lcr3(). On native this is a ring3 operation, implemented in
assembler. On xen, this is a PV call. Currently there's no need to have
both implementations in a single binary, since PV and native binaries
are distinct. With PVHVM, we have a situation where at boot time, the
native version is required, and after XEN is detected, we can use the
XEN version of the call. I've taken the approach of naming the
individual functions distinctly, eg: i386_lcr3(), or xen_lcr3(), while
and exporting them weakly as the consumed version, ie; lcr3().

What happens is that when the individual binary is compiled, the
appropriate weakly exported symbol takes over, and things work as
expected. When  the combined binary for PVHVM is required, we write a
strongly exported "switch" function, called lcr3() (I haven't committed
this yet) which overrides both the weak symbols. It can then do the
runtime calls by calling into the appropriate i386_lcr3() or xen_lcr3(),
depending on the mode in which it is running.

In the specific case above however, I realised that most of the
functions I converted are not appropriate for use in PVHVM as the native
versions are likely to be more efficient. I may roll them back once
things stabilise.

I hope this was a useful explanation.
-- 
~cherry


Re: CVS commit: src/sys/arch/xen/x86

2016-08-07 Thread Cherry G. Mathew
On 2 August 2016 at 19:51, Maxime Villard  wrote:

> Module Name:src
> Committed By:   maxv
> Date:   Tue Aug  2 14:21:53 UTC 2016
>
> Modified Files:
> src/sys/arch/xen/x86: x86_xpmap.c
>
> Log Message:
> Map the kernel text, rodata and data+bss independently on Xen, with
> respectively RX, R and RW.
>
>
>
Hi - wondering why you're getting more divergence from generic x86 - is
there a way to do this (and the pg_nx stuff for eg:) without having to
special case this in Xen ?

-- 
~~Cherry


Re: CVS commit: src/sys/external/bsd/drm/dist/shared-core

2014-09-07 Thread Cherry G. Mathew
Hi Matt,

This doesn't fix drm2 i915 breakage. Could you revisit this please, since
it's in external/ ?

Thanks,


On 5 September 2014 15:10, Matt Thomas m...@netbsd.org wrote:

 Module Name:src
 Committed By:   matt
 Date:   Fri Sep  5 09:40:44 UTC 2014

 Modified Files:
 src/sys/external/bsd/drm/dist/shared-core: i915_drv.h
 i915_suspend.c

 Log Message:
 Rename enum pipe to enum pipe so it won't conflcit with struct pipe.


 To generate a diff of this commit:
 cvs rdiff -u -r1.5 -r1.6
 src/sys/external/bsd/drm/dist/shared-core/i915_drv.h
 cvs rdiff -u -r1.6 -r1.7 \
 src/sys/external/bsd/drm/dist/shared-core/i915_suspend.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.




-- 
~~Cherry


Re: CVS commit: src/sys/arch

2014-06-13 Thread Cherry G . Mathew
Hi Paul,

Thanks for working on this.

 Paul == Paul Goyette p...@whooppee.com writes:
 Date: Fri, 13 Jun 2014 17:52:53 -0700 (PDT)
 From: Paul Goyette p...@whooppee.com
 
 I modified the newer patch so that things still compile on non-XEN
 kernels, and added the version check to an additional invocation of
 xen_pagezero().  The patch is attached, and I have verified that it
 successfully boots under the pre-3.4 hypervisor.
 
 Is there any reason why this should not be committed?
 

[...]

 
 Index: src/sys/arch/x86/x86/pmap.c
 ===
 RCS file: /cvsroot/src/sys/arch/x86/x86/pmap.c,v
 retrieving revision 1.182
 diff -u -p -r1.182 pmap.c
 --- src/sys/arch/x86/x86/pmap.c   6 May 2014 04:26:23 -   1.182
 +++ src/sys/arch/x86/x86/pmap.c   14 Jun 2014 00:51:56 -
 @@ -211,6 +211,15 @@ __KERNEL_RCSID(0, $NetBSD: pmap.c,v 1.1
  #ifdef XEN
  #include xen/xen-public/xen.h
  #include xen/hypervisor.h
 +
 +/* 
 + * Does the hypervisor we're running on support an api
 + * call at the requested version number ?
 + */
 +#define XEN_VERSION_SUPPORTED(major, minor)  \
 + (XEN_MAJOR(xen_version)  (major) ||\
 +  (XEN_MAJOR(xen_version) == (major)   \
 +   XEN_MINOR(xen_version) = (minor)))
  #endif
  

This should probably go in include/hypervisor.h, right under the
definitions for XEN_MAJOR() and XEN_MINOR()

  /*
 @@ -3013,9 +3022,11 @@ pmap_zero_page(paddr_t pa)
  {
  #if defined(__HAVE_DIRECT_MAP)
   pagezero(PMAP_DIRECT_MAP(pa));
 -#elif defined(XEN)
 - xen_pagezero(pa);
  #else
 +#if defined(XEN)
 + if (XEN_VERSION_SUPPORTED(3, 4))
 + xen_pagezero(pa);
 +#endif

The version check should probably be inside xen_pagezero() but that
would require a tiny bit of re-org in pmap.c since the direct map code
is equally culpable.

IMO this can wait for a post-commit sweep.

Looks ok to me otherwise.

Cheers,
-- 
Cherry


Re: CVS commit: src/sys/arch

2014-06-12 Thread Cherry G . Mathew
 Paul == Paul Goyette p...@whooppee.com writes:

 Module Name: src Committed By: cherry Date: Tue May 6 04:26:24
 UTC 2014
 
 Modified Files: src/sys/arch/x86/x86: pmap.c
 src/sys/arch/xen/include: xenpmap.h src/sys/arch/xen/x86:
 x86_xpmap.c
 
 Log Message: Use the hypervisor to copy/zero pages. This saves us
 the extra overheads of setting up temporary kernel
 mapping/unmapping.
 
 riz@ reports savings of about 2s on a 120s kernel build.

Paul This commit seems to have broken the ability to boot NetBSD
Paul under at least some hypervisors.  My VPS at prgmr.com boots
Paul fine with a kernel from just before this change.  However,
Paul with this commit, the following panic occurs:

Sorry, the first patch was incorrect ( it skips over some newer legit
versions! ) - thanks abs@.

Thanks to Gary Duzan for the initial suggestion and abs@ for catching
the error in my first patch.

-- 
Cherry

diff -r f0203e3974e7 sys/arch/x86/x86/pmap.c
--- a/sys/arch/x86/x86/pmap.c	Sat May 31 20:20:36 2014 +0900
+++ b/sys/arch/x86/x86/pmap.c	Thu Jun 12 15:46:00 2014 +0900
@@ -211,6 +211,15 @@
 #ifdef XEN
 #include xen/xen-public/xen.h
 #include xen/hypervisor.h
+
+/* 
+ * Does the hypervisor we're running on support an api
+ * call at the requested version number ?
+ */
+#define XEN_VERSION_SUPPORTED(major, minor)		\
+	(XEN_MAJOR(xen_version)  (major) ||		\
+	 (XEN_MAJOR(xen_version) == (major) 		\
+	  XEN_MINOR(xen_version) = (minor)))
 #endif
 
 /*
@@ -3097,8 +3106,11 @@
 
 	memcpy((void *)dstva, (void *)srcva, PAGE_SIZE);
 #elif defined(XEN)
-	xen_copy_page(srcpa, dstpa);
-#else
+	if (XEN_VERSION_SUPPORTED(3, 4)) {
+		xen_copy_page(srcpa, dstpa);
+		return;
+	}
+#endif
 	pt_entry_t *spte;
 	pt_entry_t *dpte;
 	void *csrcva;
@@ -3128,7 +3140,6 @@
 	pmap_pte_flush();
 #endif
 	kpreempt_enable();
-#endif
 }
 
 static pt_entry_t *
@@ -4108,8 +4119,11 @@
 #ifdef __HAVE_DIRECT_MAP
 		pagezero(PMAP_DIRECT_MAP(*paddrp));
 #elif defined(XEN)
-		xen_pagezero(*paddrp);
-#else
+		if (XEN_VERSION_SUPPORTED(3, 4)) {
+			xen_pagezero(*paddrp);
+			return true;
+		}
+#endif
 		kpreempt_disable();
 		pmap_pte_set(early_zero_pte,
 		pmap_pa2pte(*paddrp) | PG_V | PG_RW | PG_k);
@@ -4121,7 +4135,6 @@
 		pmap_pte_flush();
 #endif /* defined(DIAGNOSTIC) */
 		kpreempt_enable();
-#endif
 	} else {
 		/* XXX */
 		ptp = uvm_pagealloc(NULL, 0, NULL,


Re: CVS commit: src/sys/arch

2014-06-11 Thread Cherry G . Mathew
 Paul == Paul Goyette p...@whooppee.com writes:

 Module Name: src Committed By: cherry Date: Tue May 6 04:26:24
 UTC 2014
 
 Modified Files: src/sys/arch/x86/x86: pmap.c
 src/sys/arch/xen/include: xenpmap.h src/sys/arch/xen/x86:
 x86_xpmap.c
 
 Log Message: Use the hypervisor to copy/zero pages. This saves us
 the extra overheads of setting up temporary kernel
 mapping/unmapping.
 
 riz@ reports savings of about 2s on a 120s kernel build.

Paul This commit seems to have broken the ability to boot NetBSD
Paul under at least some hypervisors.  My VPS at prgmr.com boots
Paul fine with a kernel from just before this change.  However,
Paul with this commit, the following panic occurs:


[...]

Hi,

Does the attached patch work for you ?

-- 
Cherry



diff -r f0203e3974e7 sys/arch/x86/x86/pmap.c
--- a/sys/arch/x86/x86/pmap.c	Sat May 31 20:20:36 2014 +0900
+++ b/sys/arch/x86/x86/pmap.c	Thu Jun 12 14:15:30 2014 +0900
@@ -3097,8 +3097,12 @@
 
 	memcpy((void *)dstva, (void *)srcva, PAGE_SIZE);
 #elif defined(XEN)
-	xen_copy_page(srcpa, dstpa);
-#else
+	if (XEN_MAJOR(xen_version) = 3 
+	XEN_MINOR(xen_version) = 4) {
+		xen_copy_page(srcpa, dstpa);
+		return;
+	}
+#endif
 	pt_entry_t *spte;
 	pt_entry_t *dpte;
 	void *csrcva;
@@ -3128,7 +3132,6 @@
 	pmap_pte_flush();
 #endif
 	kpreempt_enable();
-#endif
 }
 
 static pt_entry_t *
@@ -4108,8 +4111,12 @@
 #ifdef __HAVE_DIRECT_MAP
 		pagezero(PMAP_DIRECT_MAP(*paddrp));
 #elif defined(XEN)
+	if (XEN_MAJOR(xen_version) = 3 
+	XEN_MINOR(xen_version) = 4) {
 		xen_pagezero(*paddrp);
-#else
+		return true;
+	}
+#endif
 		kpreempt_disable();
 		pmap_pte_set(early_zero_pte,
 		pmap_pa2pte(*paddrp) | PG_V | PG_RW | PG_k);
@@ -4121,7 +4128,6 @@
 		pmap_pte_flush();
 #endif /* defined(DIAGNOSTIC) */
 		kpreempt_enable();
-#endif
 	} else {
 		/* XXX */
 		ptp = uvm_pagealloc(NULL, 0, NULL,


Re: CVS commit: src/sys/arch/xen

2013-01-12 Thread Cherry G. Mathew
Hi Manuel,

On 13 January 2013 02:39, Manuel Bouyer bou...@netbsd.org wrote:
 Module Name:src
 Committed By:   bouyer
 Date:   Sat Jan 12 21:09:10 UTC 2013

 Modified Files:
 src/sys/arch/xen/include: hypervisor.h
 src/sys/arch/xen/x86: hypervisor_machdep.c
 src/sys/arch/xen/xen: evtchn.c

 Log Message:
 Revert these commits from november 2012:
 http://mail-index.netbsd.org/source-changes/2012/11/25/msg039125.html
 http://mail-index.netbsd.org/source-changes/2012/11/25/msg039126.html

This commit has no functional difference. Are you sure it needs to be
backed out ?

 http://mail-index.netbsd.org/source-changes/2012/11/25/msg039142.html




 they cause a i386PAE domU to hang while running ATF tests, as shown in
 http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/

 (we should pay more attention to test results, myself first).

I was happy to back these out myself, but I couldn't reproduce your bug.


-- 
~Cherry


Re: CVS commit: src/sys/uvm

2012-09-05 Thread Cherry G. Mathew
On 4 September 2012 12:32, Mindaugas Rasiukevicius rm...@netbsd.org wrote:
 Matt Thomas m...@3am-software.com wrote:

 On Sep 3, 2012, at 3:33 PM, Mindaugas Rasiukevicius wrote:

  Matt Thomas m...@netbsd.org wrote:
  Module Name:   src
  Committed By:  matt
  Date:  Mon Sep  3 19:53:43 UTC 2012
 
  Modified Files:
 src/sys/uvm: uvm_km.c uvm_map.c
 
  Log Message:
  Switch to a spin lock (uvm_kentry_lock) which, fortunately, was sitting
  there unused.
 
  - pmap_growkernel() may use adaptive locks, which cannot be acquired
  with the spin lock held; so the change breaks at least x86 and alpha.
 
  - Why in the caller?  I think it would be better do leave it for the
  pmaps, e.g. they may re-use the locks which already provide the
  necessary protection and which need to be taken anyway (like in x86
  pmap).

 uvm_maxkaddr need a lock for its updating

 growkernel can be called uvm_km_mem_alloc which might be called
 at interrupt level.

 The second point stands, but I see you already fixed it - thanks!

 As for pmap_growkernel() being called from interrupt context - right, then
 it seems Xen is broken, as its path in pmap_growkernel() acquires adaptive
 pmaps_lock and might call pool_cache_invalidate() which can block..


I don't see a xen specific lock being taken in pmap_growkernel()

-- 
~Cherry


Re: CVS commit: src/tests/ipf

2012-03-27 Thread Cherry G. Mathew
On 27 March 2012 22:38, Alan Barrett a...@cequrux.com wrote:
 On Tue, 27 Mar 2012, Jukka Ruohonen wrote:

 Modified Files:
        src/tests/ipf: t_filter_exec.sh t_filter_parse.sh t_nat_exec.sh
            t_nat_ipf_exec.sh

 Log Message:
 Mark the failing tests as broken. XXX: If no one is willing to maintain
 the ipf tests, these should be removed.


 I object to this.  If ipf fails its tests, then the fact should be made
 clear in the test reports, not hidden by disabling the tests.
 I don't know whether the bugs are in ipf or in the tests, but either way,
 removing or disabling the tests seems to me to be counter-productive.

 --apb (Alan Barrett)

+1

-- 
~Cherry


Re: CVS commit: src/lib/libc/stdio

2012-03-27 Thread Cherry G. Mathew
On 28 March 2012 07:53, Christos Zoulas chris...@astron.com wrote:
 In article 20120327202907.gt26...@bigmac.stderr.spb.ru,
 Valeriy E. Ushakov u...@stderr.spb.ru wrote:

But that is not what the code was.  The code was:

    char c; if (c == CHAR_MAX) ...

and *that* is portable.  As I said in another mail to thsi thread that
went unanswered, it is literally schizophrenic of lint to complain
about it.

 How can lint know that if (c == 255) is portable? Because CHAR_MAX
 gets expanded by cpp to 255 before lint parses the file.


(How) does the compiler (not) catch this one ?

-- 
~Cherry


Re: CVS commit: src/sys/arch/i386/i386

2012-03-04 Thread Cherry G. Mathew
On 5 March 2012 02:14, Manuel Bouyer bou...@netbsd.org wrote:
 Module Name:    src
 Committed By:   bouyer
 Date:           Sun Mar  4 20:44:17 UTC 2012

 Modified Files:
        src/sys/arch/i386/i386: machdep.c

 Log Message:
 Don't try to uvm_page_physload() the tmpgdt page: this always fails because
 only one physical segment is configured for Xen, and it's probably not
 worth it to create a second physseg with a single page (uvm has optimisations
 for the VM_PHYSSEG_MAX == 1 case)




ic, so we're potentially leaking 2 pages at boot now


-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-02-25 Thread Cherry G. Mathew
On 25 February 2012 17:32, Manuel Bouyer bou...@antioche.eu.org wrote:
 On Sat, Feb 25, 2012 at 10:30:30AM +0530, Cherry G. Mathew wrote:
 I've made a few changes to pmap.c where it looks harmless to do so,
 but are in favour of consistency.
 ftp://ftp.netbsd.org/pub/NetBSD/misc/cherry/tmp/xen-set-pte.diff

 Did you test it ? I have a vague memory that using spl() in early
 allocation routines would cause a trap (maybe because curcpu() doesn't
 work yet) but maybe your recent changes fixed it.

Yes and yes.

 Also you can change
 #ifndef XEN
       const pd_entry_t pteflags = PG_V | PG_RW;
 #else /* XEN */
       const pd_entry_t pteflags = PG_u | PG_V | PG_RW;
 #endif /* XEN */

 to:
        const pd_entry_t pteflags = PG_k | PG_V | PG_RW;

 (PG_k is defined to PG_u on Xen/amd64 and 0 otherwise).


Ah cool, that would remove those ugly #ifdefs

Cheers,
-- 
~Cherry


Re: CVS commit: src/sys/arch/x86/x86

2012-02-25 Thread Cherry G. Mathew
On 26 February 2012 01:33, Cherry G. Mathew che...@netbsd.org wrote:
 Module Name:    src
 Committed By:   cherry
 Date:           Sat Feb 25 20:03:58 UTC 2012

 Modified Files:
        src/sys/arch/x86/x86: pmap.c

 Log Message:
 Revert previous since it does a redundant xpq queue flush.
 xen_bcast_invlpg() flushes the queue for us.

As bouyer@ kindly pointed out offline.

Cheers,

-- 
~Cherry


Re: CVS commit: src/sys/arch/x86/x86

2012-02-24 Thread Cherry G. Mathew
Hi Thomas,

On 24 February 2012 14:21, Thomas Klausner w...@netbsd.org wrote:
 Hi Cherry!
 On Fri, Feb 24, 2012 at 08:44:45AM +, Cherry G. Mathew wrote:
 Module Name:  src
 Committed By: cherry
 Date:         Fri Feb 24 08:44:44 UTC 2012

 Modified Files:
       src/sys/arch/x86/x86: pmap.c

 Log Message:
 kernel page attribute change should be reflected on all cpus, since
 the page is going to be released after the _dtor() hook is called.

 Can you please explain why you reverted the previous version and now use 
 xen_bcast*?
  Thomas

That was a typo that I committed by mistake. xpq_bcast*() does not
exist. Sorry about that.

Cheers,

-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-02-24 Thread Cherry G. Mathew
On 22 February 2012 18:31, Manuel Bouyer bou...@antioche.eu.org wrote:
 On Wed, Feb 22, 2012 at 06:05:21PM +0530, Cherry G. Mathew wrote:

 I meant we could make it work, (it would already for amd64/xen since
 cpu_init_msrs() is called from cpu_hatch()) since xen has its own cpu.c

 i don't know if we can do the same for i386.

It wasn't fun, but I managed to do it.

btw, do you see a gdt page leaked between machdep.c:initgdt() and
gdt.c:gdt_init() ?

 Also xpq_cpu() is time-critical; I guess a function pointer call is faster
 than a test.

Well, as a bonus of the early %gs/%fs setup now, I'm thinking of
pruning the xpq_queue_update_xxx() in favour of pmap_set_xxx(). Also,
I'll revisiting the atomicity guarantees (eg: pmap_pte_cas() of these
functions, once we only start using them.

Cheers,
-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-02-24 Thread Cherry G. Mathew
On 24 February 2012 15:33, Manuel Bouyer bou...@antioche.eu.org wrote:
 On Fri, Feb 24, 2012 at 03:00:03PM +0530, Cherry G. Mathew wrote:
 On 22 February 2012 18:31, Manuel Bouyer bou...@antioche.eu.org wrote:
  On Wed, Feb 22, 2012 at 06:05:21PM +0530, Cherry G. Mathew wrote:
 
  I meant we could make it work, (it would already for amd64/xen since
  cpu_init_msrs() is called from cpu_hatch()) since xen has its own cpu.c
 
  i don't know if we can do the same for i386.

 It wasn't fun, but I managed to do it.

 btw, do you see a gdt page leaked between machdep.c:initgdt() and
 gdt.c:gdt_init() ?

 I can't see initgdt(), did you remove it ?

No, it's right there:
http://nxr.netbsd.org/xref/src/sys/arch/i386/i386/machdep.c#1096



  Also xpq_cpu() is time-critical; I guess a function pointer call is faster
  than a test.

 Well, as a bonus of the early %gs/%fs setup now, I'm thinking of
 pruning the xpq_queue_update_xxx() in favour of pmap_set_xxx(). Also,
 I'll revisiting the atomicity guarantees (eg: pmap_pte_cas() of these
 functions, once we only start using them.

 AFAIK they're already all used by pmap ?


Mostly, but not everywere.

 What I want to look at is *why* they're used. In some case it's only
 to collect PG_M/PG_D bits, and Xen has another, maybe more efficient
 mechanism for that. This may allow us to batch more MMU updates.

 Also, I want to look using more multicalls. This may decrease the
 number of hypercalls significantly.


I wonder if there's a way to align that with pmap(9) assumptions.
Quoting the manpage:

 In order to cope with hardware architectures that make the invalidation
 of virtual address mappings expensive (e.g., TLB invalidations, TLB
 shootdown operations for multiple processors), the pmap module is allowed
 to delay mapping invalidation or protection operations until such time as
 they are actually necessary.  The functions that are allowed to delay
 such actions are pmap_enter(), pmap_remove(), pmap_protect(),
 pmap_kenter_pa(), and pmap_kremove().  Callers of these functions must
 use the pmap_update() function to notify the pmap module that the map-
 pings need to be made correct.  Since the pmap module is provided with
 information as to which processors are using a given physical map, the
 pmap module may use whatever optimizations it has available to reduce the
 expense of virtual-to-physical mapping synchronization.

Since the XPQ can be regarded as a kind of TLB, I'm guessing we can
attempt to marry the two apis ?

Cheers,
-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-02-24 Thread Cherry G. Mathew
On 24 February 2012 18:29, Manuel Bouyer bou...@antioche.eu.org wrote:
 On Fri, Feb 24, 2012 at 06:14:11PM +0530, Cherry G. Mathew wrote:
 On 24 February 2012 15:33, Manuel Bouyer bou...@antioche.eu.org wrote:
  On Fri, Feb 24, 2012 at 03:00:03PM +0530, Cherry G. Mathew wrote:
  On 22 February 2012 18:31, Manuel Bouyer bou...@antioche.eu.org wrote:
   On Wed, Feb 22, 2012 at 06:05:21PM +0530, Cherry G. Mathew wrote:
  
   I meant we could make it work, (it would already for amd64/xen since
   cpu_init_msrs() is called from cpu_hatch()) since xen has its own cpu.c
  
   i don't know if we can do the same for i386.
 
  It wasn't fun, but I managed to do it.
 
  btw, do you see a gdt page leaked between machdep.c:initgdt() and
  gdt.c:gdt_init() ?
 
  I can't see initgdt(), did you remove it ?

 No, it's right there:
 http://nxr.netbsd.org/xref/src/sys/arch/i386/i386/machdep.c#1096

 OK, I was looking in amd64 code.
 Yes, it's quite possible that we waste a page here.


 
 
   Also xpq_cpu() is time-critical; I guess a function pointer call is 
   faster
   than a test.
 
  Well, as a bonus of the early %gs/%fs setup now, I'm thinking of
  pruning the xpq_queue_update_xxx() in favour of pmap_set_xxx(). Also,
  I'll revisiting the atomicity guarantees (eg: pmap_pte_cas() of these
  functions, once we only start using them.
 
  AFAIK they're already all used by pmap ?
 

 Mostly, but not everywere.

 there are places where they're not used on purpose (e.g. because we
 know taking the lock or raising the IPL is not needed).


I've made a few changes to pmap.c where it looks harmless to do so,
but are in favour of consistency.
ftp://ftp.netbsd.org/pub/NetBSD/misc/cherry/tmp/xen-set-pte.diff



  What I want to look at is *why* they're used. In some case it's only
  to collect PG_M/PG_D bits, and Xen has another, maybe more efficient
  mechanism for that. This may allow us to batch more MMU updates.
 
  Also, I want to look using more multicalls. This may decrease the
  number of hypercalls significantly.
 

 I wonder if there's a way to align that with pmap(9) assumptions.
 Quoting the manpage:

      In order to cope with hardware architectures that make the invalidation
      of virtual address mappings expensive (e.g., TLB invalidations, TLB
      shootdown operations for multiple processors), the pmap module is 
 allowed
      to delay mapping invalidation or protection operations until such time 
 as
      they are actually necessary.  The functions that are allowed to delay
      such actions are pmap_enter(), pmap_remove(), pmap_protect(),
      pmap_kenter_pa(), and pmap_kremove().  Callers of these functions must
      use the pmap_update() function to notify the pmap module that the map-
      pings need to be made correct.  Since the pmap module is provided with
      information as to which processors are using a given physical map, the
      pmap module may use whatever optimizations it has available to reduce 
 the
      expense of virtual-to-physical mapping synchronization.
 
 Since the XPQ can be regarded as a kind of TLB, I'm guessing we can
 attempt to marry the two apis ?

 This is more or less what I had in mind. But, for the cases where
 we need atomic operations, pmap_update() is not appropriate ...


Very cool,

Cheers,

-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-02-22 Thread Cherry G . Mathew
 Manuel == Manuel Bouyer bou...@antioche.eu.org writes:

Manuel On Wed, Feb 22, 2012 at 10:15:50AM +0530, Cherry G. Mathew wrote:
 Hi Manuel,
 
 On 22 February 2012 00:40, Manuel Bouyer bou...@netbsd.org wrote:
  Module Name:    src  Committed By:   bouyer  Date:          
 Tue Feb 21 19:10:13 UTC 2012
 
  Modified Files:         src/sys/arch/x86/x86: pmap.c       
  src/sys/arch/xen/x86: cpu.c
 
  Log Message:  Avoid early use of xen_kpm_sync(); locks are not
 available at this time.   Don't call cpu_init() twice.
 
  Makes LOCKDEBUG kernels boot again
 
 
  To generate a diff of this commit:  cvs rdiff -u -r1.166
 -r1.167 src/sys/arch/x86/x86/pmap.c
 
 @@ -4183,14 +4183,30 @@ pte = pmap_pa2pte(pa) | PG_k | PG_V |
 PG_RW; #ifdef XEN xpq_queue_pte_update(xpmap_ptetomach(pdep[i]),
 pte); - if (level == PTP_LEVELS) { #if defined(PAE) ||
 defined(__x86_64__) - if (i = PDIR_SLOT_KERN) { + if (level ==
 PTP_LEVELS  i = PDIR_SLOT_KERN) { + if (__predict_true( +
 cpu_info_primary.ci_flags  CPUF_PRESENT)) {
 
 
 Can we use that as a canonical MI method in Xen to detect if the
 tls (gs/fs) has been setup ?

Manuel probably not; it works here because we want to know if we
Manuel already probed CPUs; I'm not sure CPUF_PRESENT will garantee
Manuel that curcpu() will work.

I meant we could make it work, (it would already for amd64/xen since
cpu_init_msrs() is called from cpu_hatch()) since xen has its own cpu.c

Cheers,
-- 
Cherry


Re: CVS commit: src/sys/arch

2012-02-21 Thread Cherry G. Mathew
Hi Manuel,

On 22 February 2012 00:40, Manuel Bouyer bou...@netbsd.org wrote:
 Module Name:    src
 Committed By:   bouyer
 Date:           Tue Feb 21 19:10:13 UTC 2012

 Modified Files:
        src/sys/arch/x86/x86: pmap.c
        src/sys/arch/xen/x86: cpu.c

 Log Message:
 Avoid early use of xen_kpm_sync(); locks are not available at this time.
 Don't call cpu_init() twice.

 Makes LOCKDEBUG kernels boot again


 To generate a diff of this commit:
 cvs rdiff -u -r1.166 -r1.167 src/sys/arch/x86/x86/pmap.c

@@ -4183,14 +4183,30 @@
pte = pmap_pa2pte(pa) | PG_k | PG_V | PG_RW;
 #ifdef XEN
xpq_queue_pte_update(xpmap_ptetomach(pdep[i]), pte);
-   if (level == PTP_LEVELS) {
 #if defined(PAE) || defined(__x86_64__)
-   if (i = PDIR_SLOT_KERN) {
+   if (level == PTP_LEVELS  i = PDIR_SLOT_KERN) {
+   if (__predict_true(
+   cpu_info_primary.ci_flags  CPUF_PRESENT)) {


Can we use that as a canonical MI method in Xen to detect if the tls
(gs/fs) has been setup ? If so, I can then cleanup the xpq_cpu() mess.

Cheers,
-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-01-12 Thread Cherry G. Mathew
 Hi Brian,

On 13 January 2012 01:19, Cherry G. Mathew che...@netbsd.org wrote:
 Module Name:    src
 Committed By:   cherry
 Date:           Thu Jan 12 19:49:37 UTC 2012

 Modified Files:
        src/sys/arch/amd64/amd64: machdep.c
        src/sys/arch/i386/i386: machdep.c
        src/sys/arch/xen/x86: x86_xpmap.c

 Log Message:
 relocate pte_lock initialisation to the earliest points after %fs is first 
 usable in the XEN bootpath


 To generate a diff of this commit:
 cvs rdiff -u -r1.173 -r1.174 src/sys/arch/amd64/amd64/machdep.c
 cvs rdiff -u -r1.716 -r1.717 src/sys/arch/i386/i386/machdep.c
 cvs rdiff -u -r1.37 -r1.38 src/sys/arch/xen/x86/x86_xpmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.


These should fix the crashes with LOCKDEBUG.

Thanks for reporting them.

-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-01-08 Thread Cherry G. Mathew
On 7 January 2012 22:06, David Holland
dholland-sourcechan...@netbsd.org wrote:
 On Sat, Jan 07, 2012 at 12:36:32PM +0100, Manuel Bouyer wrote:
    the problem is that now the code performs different operations
    based upon DIAGNOSTIC or not.  it's not about whether it's wrong
    or right, but that it's different.  DIAGNOSTIC shouldn't do more
    or different things, just check stuff and assert if bad.
   
    if the hack works for DIAGNOSTIC, it seems generally right and
    should be enabled for everyone now, until the real fix is
    implemented.
  
   Seconded. Either that, or the DIAGNOSTIC printf which spams the
   console should be removed (I'm not sure why it has been added in the
   first place)

 Yeah... though I don't actually know what it's about, ISTM that if
 there's a harmless spammy printout, the right hack until things can be
 fixed properly is to silence the printout.

 Removing mappings can be expensive.


Reverted.

Thanks,
-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-01-06 Thread Cherry G. Mathew
On 6 January 2012 20:56, David Holland
dholland-sourcechan...@netbsd.org wrote:
 On Fri, Jan 06, 2012 at 03:15:28PM +, Cherry G. Mathew wrote:
   Modified Files:
        src/sys/arch/x86/x86: pmap.c
        src/sys/arch/xen/x86: cpu.c
  
   Log Message:
   Address those pesky DIAGNOSTIC messages. \n
   Take a performance hit at fork() for not DTRT. \n
   Note: Only applicable for kernels built with options DIAGNOSTIC \n
  
   [...]
   +#ifdef DIAGNOSTIC
   +               pmap_kremove(object, PAGE_SIZE);
   +#endif /* DIAGNOSTIC */
   [plus two more like that]

 Uh... even if that's correct, which doesn't seem too likely on the
 surface of things, it doesn't seem desirable.


I agree. The correct fix is to implement pmap_protect() or some such
for kernel entries.

The diagnostic message that it triggers is very verbose for XEN
consoles, which ships with DIAGNOSTIC set by default ( not sure why
this is the case ).

The pmap_kremove()s are harmless, because we know that they're not in
use elsewhere

Cheers,
-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-01-06 Thread Cherry G. Mathew
On 7 January 2012 05:53, Greg Troxel g...@ir.bbn.com wrote:

 David Holland dholland-sourcechan...@netbsd.org writes:

 On Fri, Jan 06, 2012 at 03:15:28PM +, Cherry G. Mathew wrote:
   Modified Files:
      src/sys/arch/x86/x86: pmap.c
      src/sys/arch/xen/x86: cpu.c
  
   Log Message:
   Address those pesky DIAGNOSTIC messages. \n
   Take a performance hit at fork() for not DTRT. \n
   Note: Only applicable for kernels built with options DIAGNOSTIC \n
  
   [...]
   +#ifdef DIAGNOSTIC
   +               pmap_kremove(object, PAGE_SIZE);
   +#endif /* DIAGNOSTIC */
   [plus two more like that]

 Uh... even if that's correct, which doesn't seem too likely on the
 surface of things, it doesn't seem desirable.

 I'm not sure if it's what David meant, but it seems wrong to have
 different behavior based on DIAGNOSTIC.  I think DIAGNOSTIC is supposed
 to be about enabling inexpensive consistency checks, only.  So if it's
 necessary to have consistency to have that call, it should be
 unconditional.  And if not, it shouldn't exist in any case.


 Do you understand precisely what's going on?  Is this about omitting
 steps necessary for consistency, but in the case where the new state
 will be immediately discarded?


From my reply to David: The pmap_kremove()s are harmless, because we
know that they're not in
use elsewhere

Please see the diffs in context. The one in pmap.c is in
pmap_pdp_ctor() - this call constructs the PDP frame for a new pmap -
this happens at fork() as a consequence of a new address space being
setup by uvm. At this point nothing else is using the PDP frame, and
the pmaps list is locked, so it can't get accidentally traversed or
loaded ( this would be an error anyway, as the PDP would crash the cpu
on which it is loaded since setup is incomplete at this stage.

The other ones are in xen/x86/cpu.c:pmap_cpu_init_late() which is part
of vcpu setup. The frame entries in this case are CPU local and can't
be accessed by other cpus (except during setup), so we know that the
frame is not in use.

In any case, these diffs are all XEN specific, and I'm pretty sure I
know what I'm doing, unless I've missed something glaringly obvious  -
please let me know.

The correct solution to this is to update pmap_protect() to handle
kernel entries.

-- 
~Cherry


Re: CVS commit: src/sys/arch

2012-01-06 Thread Cherry G. Mathew
On 7 January 2012 12:31, matthew green m...@eterna.com.au wrote:

  On Fri, Jan 06, 2012 at 03:15:28PM +, Cherry G. Mathew wrote:
    Modified Files:
       src/sys/arch/x86/x86: pmap.c
       src/sys/arch/xen/x86: cpu.c
   
    Log Message:
    Address those pesky DIAGNOSTIC messages. \n
    Take a performance hit at fork() for not DTRT. \n
    Note: Only applicable for kernels built with options DIAGNOSTIC \n
   
    [...]
    +#ifdef DIAGNOSTIC
    +               pmap_kremove(object, PAGE_SIZE);
    +#endif /* DIAGNOSTIC */
    [plus two more like that]
 
  Uh... even if that's correct, which doesn't seem too likely on the
  surface of things, it doesn't seem desirable.
 
  I'm not sure if it's what David meant, but it seems wrong to have
  different behavior based on DIAGNOSTIC.  I think DIAGNOSTIC is supposed
  to be about enabling inexpensive consistency checks, only.  So if it's
  necessary to have consistency to have that call, it should be
  unconditional.  And if not, it shouldn't exist in any case.
 
 
  Do you understand precisely what's going on?  Is this about omitting
  steps necessary for consistency, but in the case where the new state
  will be immediately discarded?
 

 From my reply to David: The pmap_kremove()s are harmless, because we
 know that they're not in
 use elsewhere

 Please see the diffs in context. The one in pmap.c is in
 pmap_pdp_ctor() - this call constructs the PDP frame for a new pmap -
 this happens at fork() as a consequence of a new address space being
 setup by uvm. At this point nothing else is using the PDP frame, and
 the pmaps list is locked, so it can't get accidentally traversed or
 loaded ( this would be an error anyway, as the PDP would crash the cpu
 on which it is loaded since setup is incomplete at this stage.

 The other ones are in xen/x86/cpu.c:pmap_cpu_init_late() which is part
 of vcpu setup. The frame entries in this case are CPU local and can't
 be accessed by other cpus (except during setup), so we know that the
 frame is not in use.

 In any case, these diffs are all XEN specific, and I'm pretty sure I
 know what I'm doing, unless I've missed something glaringly obvious  -
 please let me know.

 The correct solution to this is to update pmap_protect() to handle
 kernel entries.

 the problem is that now the code performs different operations
 based upon DIAGNOSTIC or not.  it's not about whether it's wrong
 or right, but that it's different.  DIAGNOSTIC shouldn't do more
 or different things, just check stuff and assert if bad.

 if the hack works for DIAGNOSTIC, it seems generally right and
 should be enabled for everyone now, until the real fix is
 implemented.



Ah, right - sorry, I got distracted a bit by the description of the
paper-over.
I'll remove the #ifdef's presently.

-- 
~Cherry


Re: CVS commit: src/sys/arch

2011-12-30 Thread Cherry G. Mathew
On 8 November 2011 05:50, Jean-Yves Migeon jeanyves.mig...@free.fr wrote:
 On 06.11.2011 16:18, Cherry G. Mathew wrote:
 Module Name:  src
 Committed By: cherry
 Date:         Sun Nov  6 15:18:19 UTC 2011

 Modified Files:
       src/sys/arch/amd64/include: pmap.h
       src/sys/arch/i386/include: pmap.h
       src/sys/arch/x86/include: cpu.h
       src/sys/arch/x86/x86: pmap.c
       src/sys/arch/xen/x86: cpu.c x86_xpmap.c

 Log Message:
 [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP
 aware.

 Some comments.

 -#ifdef PAE
 -/* Address of the static kernel's L2 page */
 -pd_entry_t *pmap_kl2pd;
 -paddr_t pmap_kl2paddr;
 -#endif
 -
 -
 [snip]
   #ifdef PAE
 -     uint32_t        ci_pae_l3_pdirpa; /* PA of L3 PD */
 +     paddr_t ci_pae_l3_pdirpa; /* PA of L3 PD */
       pd_entry_t *    ci_pae_l3_pdir; /* VA pointer to L3 PD */
   #endif

 [snip]
 +
 +#if defined(PAE)
 +     ci-ci_pae_l3_pdir = (paddr_t *)uvm_km_alloc(kernel_map, PAGE_SIZE,
 0,
 +         UVM_KMF_WIRED | UVM_KMF_ZERO | UVM_KMF_NOWAIT);
 +
 +     if (ci-ci_pae_l3_pdir == NULL) {
 +             panic(%s: failed to allocate L3 per-cpu PD for CPU %d\n,
 +                   __func__, cpu_index(ci));
 +     }

 Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type.


Done.

Forgot to attribute your suggestion + this thread in the commit log. I
hope this makes up for it :-)
-- 
~Cherry


Re: CVS commit: src/sys/arch/xen/xen

2011-12-07 Thread Cherry G. Mathew
On 7 December 2011 19:19, Christoph Egger ceg...@netbsd.org wrote:
 Module Name:    src
 Committed By:   cegger
 Date:           Wed Dec  7 13:49:04 UTC 2011

 Modified Files:
        src/sys/arch/xen/xen: evtchn.c

 Log Message:
 build fix: add back sys/malloc.h. malloc(9) is still in use.


Hi, sorry about this, I missed it - can you move the malloc() in
pirq_establish() to kmem_alloc() instead ? iiuc, we're moving away
from malloc() so this is regress.

Cheers,

-- 
~Cherry


Re: CVS commit: src/sys/arch

2011-11-20 Thread Cherry G. Mathew
On 21 November 2011 01:11, Jean-Yves Migeon j...@netbsd.org wrote:
 Module Name:    src
 Committed By:   jym
 Date:           Sun Nov 20 19:41:27 UTC 2011

 Modified Files:
        src/sys/arch/x86/include: pmap.h

...

+#ifdef PAE
+void   pmap_map_recursive_entries(void);
+void   pmap_unmap_recursive_entries(void);
+#endif /* PAE */
+
 #endif /* XEN */


Since that's xen specific, can that go in xenpmap.h ?

Cheers,
-- 
~Cherry


Re: CVS commit: src/sys/arch

2011-11-20 Thread Cherry G. Mathew
On 21 November 2011 02:45, Jean-Yves Migeon jeanyves.mig...@free.fr wrote:


 Observed in QEMU, Virtualbox, and my amd64 spare host.

Thank you, I will look into this presently.


-- 
~Cherry


Re: CVS commit: src/sys/arch

2011-11-08 Thread Cherry G. Mathew
On 8 November 2011 05:50, Jean-Yves Migeon jeanyves.mig...@free.fr wrote:
 On 06.11.2011 16:18, Cherry G. Mathew wrote:
 Module Name:  src
 Committed By: cherry
 Date:         Sun Nov  6 15:18:19 UTC 2011

 Modified Files:
       src/sys/arch/amd64/include: pmap.h
       src/sys/arch/i386/include: pmap.h
       src/sys/arch/x86/include: cpu.h
       src/sys/arch/x86/x86: pmap.c
       src/sys/arch/xen/x86: cpu.c x86_xpmap.c

 Log Message:
 [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP
 aware.

 Some comments.


...
 -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */
 -#if defined(XEN)  defined(__x86_64__)
 -#define PG_k PG_u
 -#else
 -#define PG_k 0
 -#endif
 -

 Are you sure that all the mapping sites are safe (PT/PD bits), given the
 pmap split between pmap/xen_xpmap.c?


Ok, I realise I've broken the build with this one - apologies. For
some odd reason my tree built ok ( even after nuking the obj dir)

The current bandaid by christos and njoly is incorrect. I propose
re-exporting PG_k to x86/include/pmap.h until (if ?) the xen pmap is
completely independant of the x86 one.

Please let me know if there are objections to this patch below:

Thanks,

-- 
~Cherry



Index: arch/x86/include/pmap.h
===
RCS file: /cvsroot/src/sys/arch/x86/include/pmap.h,v
retrieving revision 1.44
diff -u -r1.44 pmap.h
--- arch/x86/include/pmap.h 6 Nov 2011 11:40:47 -   1.44
+++ arch/x86/include/pmap.h 8 Nov 2011 13:35:16 -
@@ -173,6 +173,13 @@
((pmap)-pm_pdirpa[0] + (index) * sizeof(pd_entry_t))
 #endif

+/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */
+#if defined(XEN)  defined(__x86_64__)
+#define PG_k PG_u
+#else
+#define PG_k 0
+#endif
+
 /*
  * MD flags that we use for pmap_enter and pmap_kenter_pa:
  */
Index: arch/x86/x86/pmap.c
===
RCS file: /cvsroot/src/sys/arch/x86/x86/pmap.c,v
retrieving revision 1.140
diff -u -r1.140 pmap.c
--- arch/x86/x86/pmap.c 8 Nov 2011 12:44:29 -   1.140
+++ arch/x86/x86/pmap.c 8 Nov 2011 13:35:20 -
@@ -211,11 +211,6 @@
 #include xen/hypervisor.h
 #endif

-/* If this is not needed anymore it should be GC'ed */
-#ifndef PG_k
-#definePG_k0
-#endif
-
 /*
  * general info:
  *
Index: arch/xen/include/xenpmap.h
===
RCS file: /cvsroot/src/sys/arch/xen/include/xenpmap.h,v
retrieving revision 1.30
diff -u -r1.30 xenpmap.h
--- arch/xen/include/xenpmap.h  6 Nov 2011 11:40:47 -   1.30
+++ arch/xen/include/xenpmap.h  8 Nov 2011 13:35:20 -
@@ -34,13 +34,6 @@
 #include opt_xen.h
 #endif

-/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */
-#if defined(XEN)  defined(__x86_64__)
-#define PG_k PG_u
-#else
-#define PG_k 0
-#endif
-
 #defineINVALID_P2M_ENTRY   (~0UL)

 void xpq_queue_machphys_update(paddr_t, paddr_t);
Index: arch/xen/x86/xen_pmap.c
===
RCS file: /cvsroot/src/sys/arch/xen/x86/xen_pmap.c,v
retrieving revision 1.7
diff -u -r1.7 xen_pmap.c
--- arch/xen/x86/xen_pmap.c 6 Nov 2011 11:40:47 -   1.7
+++ arch/xen/x86/xen_pmap.c 8 Nov 2011 13:35:21 -
@@ -142,13 +142,6 @@
 #include xen/hypervisor.h
 #endif

-/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */
-#if defined(XEN)  defined(__x86_64__)
-#define PG_k PG_u
-#else
-#define PG_k 0
-#endif
-
 #define COUNT(x)   /* nothing */

 static pd_entry_t * const alternate_pdes[] = APDES_INITIALIZER;


Re: CVS commit: src/sys/arch

2011-11-08 Thread Cherry G. Mathew
On 8 November 2011 15:20, Jean-Yves Migeon jeanyves.mig...@free.fr wrote:
 On Tue, 8 Nov 2011 08:49:22 +0530, Cherry G. Mathew wrote:

 Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type.

 Unless you can convince me that your code will do the right thing(tm),
 the
 PDP for PAE should to be allocated below the 4GiB physical boundary;
 unless
 mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce that.
 Besides,
 %cr3 is still a 32 bits entity under PAE. Trying to stash a 64 bits
 paddr_t
 in %cr3 is a really bad idea.

 See comments http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588

 This doesn't hold true for Xen. See:
 http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509

 Xen does the 4GB translation for the VM by maintaining its own
 shadow. I just took the easier route ( after initially using the x86
 pmap  4GB cr3 code ), but if you're thinking of a unified a
 xen/non-xen implementation we'll have to re-think this one.

 Okay; then all users of cr3 have to be documented (_especially_ the native
 x86 code), as there at least two places where there's implicit uint64_t =
 uint32_t truncation: lcr3() and pcb_cr3. A comment above these should be
 enough. Same goes for having paddr_t for L3 PD in struct cpu_info.


hmm... I wonder if it would be cleaner to do this within a  #ifdef
XEN/#endif ?

 - are you sure that these have to be
 defined(PAE) || defined(__x86_64__) ?


That's a crude way of making pmap_cpu_init_late() do nothing for i386
(non-pae) since i386 doesn't need shadow PMDs.


 - please use for (i = 0; i  PTP_LEVELS - 1; i++ ) { ... }. I will have
 to
 identify these blocks later when GENERIC will include both PAE and !PAE
 pmaps. PTP_LEVELS makes it a lot easier.


 ok, will watchout for it - do you want to do this one yourself ?

 Please do :)

ok, I'll take flak for further breakage ;-P

Cheers,
-- 
~Cherry


Re: CVS commit: src/sys/arch

2011-11-08 Thread Cherry G. Mathew
On 8 November 2011 19:11, Cherry G. Mathew cherry.g.mat...@gmail.com wrote:
 On 8 November 2011 05:50, Jean-Yves Migeon jeanyves.mig...@free.fr wrote:
 On 06.11.2011 16:18, Cherry G. Mathew wrote:
 Module Name:  src
 Committed By: cherry
 Date:         Sun Nov  6 15:18:19 UTC 2011

 Modified Files:
       src/sys/arch/amd64/include: pmap.h
       src/sys/arch/i386/include: pmap.h
       src/sys/arch/x86/include: cpu.h
       src/sys/arch/x86/x86: pmap.c
       src/sys/arch/xen/x86: cpu.c x86_xpmap.c

 Log Message:
 [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP
 aware.

 Some comments.


 ...
 -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */
 -#if defined(XEN)  defined(__x86_64__)
 -#define PG_k PG_u
 -#else
 -#define PG_k 0
 -#endif
 -

 Are you sure that all the mapping sites are safe (PT/PD bits), given the
 pmap split between pmap/xen_xpmap.c?


 Ok, I realise I've broken the build with this one - apologies. For
 some odd reason my tree built ok ( even after nuking the obj dir)

 The current bandaid by christos and njoly is incorrect. I propose
 re-exporting PG_k to x86/include/pmap.h until (if ?) the xen pmap is
 completely independant of the x86 one.

 Please let me know if there are objections to this patch below:

 Thanks,

 --
 ~Cherry



 Index: arch/x86/include/pmap.h
 ===
 RCS file: /cvsroot/src/sys/arch/x86/include/pmap.h,v
 retrieving revision 1.44
 diff -u -r1.44 pmap.h
 --- arch/x86/include/pmap.h     6 Nov 2011 11:40:47 -       1.44
 +++ arch/x86/include/pmap.h     8 Nov 2011 13:35:16 -
 @@ -173,6 +173,13 @@
        ((pmap)-pm_pdirpa[0] + (index) * sizeof(pd_entry_t))
  #endif

 +/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */
 +#if defined(XEN)  defined(__x86_64__)
 +#define PG_k PG_u
 +#else
 +#define PG_k 0
 +#endif
 +
  /*
  * MD flags that we use for pmap_enter and pmap_kenter_pa:
  */
 Index: arch/x86/x86/pmap.c
 ===
 RCS file: /cvsroot/src/sys/arch/x86/x86/pmap.c,v
 retrieving revision 1.140
 diff -u -r1.140 pmap.c
 --- arch/x86/x86/pmap.c 8 Nov 2011 12:44:29 -       1.140
 +++ arch/x86/x86/pmap.c 8 Nov 2011 13:35:20 -
 @@ -211,11 +211,6 @@
  #include xen/hypervisor.h
  #endif

 -/* If this is not needed anymore it should be GC'ed */
 -#ifndef PG_k
 -#define        PG_k    0
 -#endif
 -
  /*
  * general info:
  *
 Index: arch/xen/include/xenpmap.h
 ===
 RCS file: /cvsroot/src/sys/arch/xen/include/xenpmap.h,v
 retrieving revision 1.30
 diff -u -r1.30 xenpmap.h
 --- arch/xen/include/xenpmap.h  6 Nov 2011 11:40:47 -       1.30
 +++ arch/xen/include/xenpmap.h  8 Nov 2011 13:35:20 -
 @@ -34,13 +34,6 @@
  #include opt_xen.h
  #endif

 -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */
 -#if defined(XEN)  defined(__x86_64__)
 -#define PG_k PG_u
 -#else
 -#define PG_k 0
 -#endif
 -
  #define        INVALID_P2M_ENTRY       (~0UL)

  void xpq_queue_machphys_update(paddr_t, paddr_t);
 Index: arch/xen/x86/xen_pmap.c
 ===
 RCS file: /cvsroot/src/sys/arch/xen/x86/xen_pmap.c,v
 retrieving revision 1.7
 diff -u -r1.7 xen_pmap.c
 --- arch/xen/x86/xen_pmap.c     6 Nov 2011 11:40:47 -       1.7
 +++ arch/xen/x86/xen_pmap.c     8 Nov 2011 13:35:21 -
 @@ -142,13 +142,6 @@
  #include xen/hypervisor.h
  #endif

 -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */
 -#if defined(XEN)  defined(__x86_64__)
 -#define PG_k PG_u
 -#else
 -#define PG_k 0
 -#endif
 -
  #define COUNT(x)       /* nothing */

  static pd_entry_t * const alternate_pdes[] = APDES_INITIALIZER;


Committed.

Thanks,

-- 
~Cherry


Re: CVS commit: src/sys/arch

2011-11-08 Thread Cherry G. Mathew
Hi,

On 9 November 2011 02:02, Jean-Yves Migeon jeanyves.mig...@free.fr wrote:
 On 08.11.2011 14:53, Cherry G. Mathew wrote:

 On 8 November 2011 15:20, Jean-Yves Migeonjeanyves.mig...@free.fr

 wrote:

 On Tue, 8 Nov 2011 08:49:22 +0530, Cherry G. Mathew wrote:

 Please keep ci_pae_l3_pdir to a uint32_t and back out its
 paddr_t type.

 Unless you can convince me that your code will do the right
 thing(tm), the PDP for PAE should to be allocated below the
 4GiB physical boundary; unless mistaken, the
 uvm_km_alloc(kernel_map, ...) does not enforce that. Besides,
 %cr3 is still a 32 bits entity under PAE. Trying to stash a 64
 bits paddr_t in %cr3 is a really bad idea.

 See comments

 http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588

 This doesn't hold true for Xen. See:
 http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509

 Xen does the4GB translation for the VM by maintaining its own
 shadow. I just took the easier route ( after initially using the
 x86 pmap  4GB cr3 code ), but if you're thinking of a unified a
 xen/non-xen implementation we'll have to re-think this one.

 Okay; then all users of cr3 have to be documented (_especially_
 the

 native

 x86 code), as there at least two places where there's implicit

 uint64_t =

 uint32_t truncation: lcr3() and pcb_cr3. A comment above these
 should be enough. Same goes for having paddr_t for L3 PD in struct
 cpu_info.


 hmm... I wonder if it would be cleaner to do this within a  #ifdef
 XEN/#endif ?

 The cleanest way would be to share the code between x86 and Xen, keep
 the allocation below 4GiB boundary for both, and use it everywhere in
 the pmap code. Only the manipulation of the vcpu_guest_context_t
 ctrlregs members would have to force this use.


Fair enough. Although the 4G tests would be a bit deceptive (since
they're cosmetic) - I guess leaving a note in the code about the
rationale behind this will help.

 - are you sure that these have to be defined(PAE) ||
 defined(__x86_64__) ?


 That's a crude way of making pmap_cpu_init_late() do nothing for
 i386 (non-pae) since i386 doesn't need shadow PMDs.

 In that case, test against i386_use_pae (0 == !PAE, 1 == PAE), and
 simply return if !PAE. Avoid having loonnng macro blocks with
 different levels of #ifdef. It's fairly difficult to untangle and
 unpleasant to read.


I agree - this  looks better.


Thanks,

-- 
~Cherry


Re: CVS commit: src/sys/arch

2011-11-07 Thread Cherry G. Mathew
Hi,

On 8 November 2011 05:50, Jean-Yves Migeon jeanyves.mig...@free.fr wrote:
 On 06.11.2011 16:18, Cherry G. Mathew wrote:
 Module Name:  src
 Committed By: cherry
 Date:         Sun Nov  6 15:18:19 UTC 2011

 Modified Files:
       src/sys/arch/amd64/include: pmap.h
       src/sys/arch/i386/include: pmap.h
       src/sys/arch/x86/include: cpu.h
       src/sys/arch/x86/x86: pmap.c
       src/sys/arch/xen/x86: cpu.c x86_xpmap.c

 Log Message:
 [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP
 aware.

 Some comments.

 -#ifdef PAE
 -/* Address of the static kernel's L2 page */
 -pd_entry_t *pmap_kl2pd;
 -paddr_t pmap_kl2paddr;
 -#endif
 -
 -
 [snip]
   #ifdef PAE
 -     uint32_t        ci_pae_l3_pdirpa; /* PA of L3 PD */
 +     paddr_t ci_pae_l3_pdirpa; /* PA of L3 PD */
       pd_entry_t *    ci_pae_l3_pdir; /* VA pointer to L3 PD */
   #endif

 [snip]
 +
 +#if defined(PAE)
 +     ci-ci_pae_l3_pdir = (paddr_t *)uvm_km_alloc(kernel_map, PAGE_SIZE,
 0,
 +         UVM_KMF_WIRED | UVM_KMF_ZERO | UVM_KMF_NOWAIT);
 +
 +     if (ci-ci_pae_l3_pdir == NULL) {
 +             panic(%s: failed to allocate L3 per-cpu PD for CPU %d\n,
 +                   __func__, cpu_index(ci));
 +     }

 Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type.

 Unless you can convince me that your code will do the right thing(tm), the
 PDP for PAE should to be allocated below the 4GiB physical boundary; unless
 mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce that. Besides,
 %cr3 is still a 32 bits entity under PAE. Trying to stash a 64 bits paddr_t
 in %cr3 is a really bad idea.

 See comments http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588


This doesn't hold true for Xen. See:
http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509


Xen does the 4GB translation for the VM by maintaining its own
shadow. I just took the easier route ( after initially using the x86
pmap  4GB cr3 code ), but if you're thinking of a unified a
xen/non-xen implementation we'll have to re-think this one.

 -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */
 -#if defined(XEN)  defined(__x86_64__)
 -#define PG_k PG_u
 -#else
 -#define PG_k 0
 -#endif
 -

 Are you sure that all the mapping sites are safe (PT/PD bits), given the
 pmap split between pmap/xen_xpmap.c?


No. I haven't reviewed this in that context - however the use of PG_k
is exactly the same in xen as it was before this commit.

 On a side note, please stick to KNF especially for 80 columns.


Noted, thanks.

 [snip]
 +void
 +pmap_cpu_init_late(struct cpu_info *ci)
 +{
 +#if defined(PAE) || defined(__x86_64__)
 +     /*
 +      * The BP has already its own PD page allocated during early
 +      * MD startup.
 +      */
 +
 +     if (ci ==cpu_info_primary)
 +             return;
 +
 +     KASSERT(ci != NULL);

 +     ci-ci_pae_l3_pdirpa = vtophys((vaddr_t) ci-ci_pae_l3_pdir);
 +     KASSERT(ci-ci_pae_l3_pdirpa != 0);
 +
 +     /* Initialise L2 entries 0 - 2: Point them to pmap_kernel() */
 +     ci-ci_pae_l3_pdir[0] =
 +         xpmap_ptom_masked(pmap_kernel()-pm_pdirpa[0]) | PG_V;
 +     ci-ci_pae_l3_pdir[1] =
 +         xpmap_ptom_masked(pmap_kernel()-pm_pdirpa[1]) | PG_V;
 +     ci-ci_pae_l3_pdir[2] =
 +         xpmap_ptom_masked(pmap_kernel()-pm_pdirpa[2]) | PG_V;
 +#endif /* PAE */

 - are you sure that these have to be
 defined(PAE) || defined(__x86_64__) ?

 - please use for (i = 0; i  PTP_LEVELS - 1; i++ ) { ... }. I will have to
 identify these blocks later when GENERIC will include both PAE and !PAE
 pmaps. PTP_LEVELS makes it a lot easier.


ok, will watchout for it - do you want to do this one yourself ?

 That's all for the first quick review :) Thanks for starting the merge of
 your branch.


Thanks for this - looking forward to more :-)

-- 
~Cherry


Re: CVS commit: [cherry-xenmp] src/sys/arch/xen/xen

2011-11-04 Thread Cherry G. Mathew
Hi Manuel,

On 23 October 2011 04:12, Manuel Bouyer bou...@netbsd.org wrote:
 Module Name:    src
 Committed By:   bouyer
 Date:           Sat Oct 22 22:42:21 UTC 2011

 Modified Files:
        src/sys/arch/xen/xen [cherry-xenmp]: clock.c

 Log Message:
 More improvements:
 - Precise what tmutex protects
 - springle some KASSERT(mutex_owned(tmutex))
 - the system time uses the CPU timecounter to get some precision;
  don't mix local CPU timecounter with VCPU0's time, it won't work well.
  Always use curcpu()'s time.


 To generate a diff of this commit:
 cvs rdiff -u -r1.54.6.6 -r1.54.6.7 src/sys/arch/xen/xen/clock.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.



Thanks for this - I'm curious to know if this fixes the clock drift
that has been reported on port-xen@

Thanks,
-- 
~Cherry


Re: CVS commit: [cherry-xenmp] src/sys/arch

2011-09-01 Thread Cherry G. Mathew
On 31/08/2011, Jean-Yves Migeon jeanyves.mig...@free.fr wrote:
 On 30.08.2011 14:53, Cherry G. Mathew wrote:
 Module Name: src
 Committed By:cherry
 Date:Tue Aug 30 12:53:46 UTC 2011

 Modified Files:
  src/sys/arch/i386/i386 [cherry-xenmp]: machdep.c
  src/sys/arch/xen/x86 [cherry-xenmp]: cpu.c x86_xpmap.c

 Log Message:
 Add per-cpu mmu queues

 Thanks for looking into it!


I do owe you a reply on tech-kern@ :-)

Will get to it once I have more data!

Thanks,
-- 
~Cherry


Re: CVS commit: src/sys/arch/xen

2011-08-29 Thread Cherry G . Mathew
Cc:ing tech-kern, to get wider feedback.
Thread started here: 
http://mail-index.netbsd.org/source-changes-d/2011/08/21/msg003897.html

 JM == Jean-Yves Migeon jeanyves.mig...@free.fr writes:

JM On Mon, 22 Aug 2011 12:47:40 +0200, Manuel Bouyer wrote:
 This is slightly more complicated than it appears. Some of the
 ops in a per-cpu queue may have ordering dependencies with
 other cpu queues, and I think this would be hard to express
 trivially. (an example would be a pte update on one queue, and
 reading the same pte read on another queue - these cases are
 quite analogous (although completely unrelated)
 

Hi,

So I had a better look at this - implemented per-cpu queues and messed
with locking a bit:


 read don't go through the xpq queue, don't they ?

JM Nope, PTE are directly obtained from the recursive mappings
JM (vtopte/kvtopte).

Let's call this out of band reads. But see below for in-band reads.

JM Content is obviously only writable by hypervisor (so it can
JM keep control of his mapping alone).
 I think this is similar to a tlb flush but the other way round, I
 guess we could use a IPI for this too.

JM IIRC that's what the current native x86 code does: it uses an
JM IPI to signal other processors that a shootdown is necessary.

Xen's TLB_FLUSH operation is synchronous, and doesn't require an IPI
(within the domain), which makes the queue ordering even more important
(to make sure that stale ptes are not reloaded before the per-cpu queue
has made progress). Yes, we can implement a roundabout ipi driven
queueflush + tlbflush scheme(described below), but that would be
performance sensitive, and the basic issue won't go away, imho.

Let's stick to the xpq ops for a second, ignoring out-of-band reads
(for which I agree that your assertion, that locking needs to be done at
a higher level, holds true).

The question here, really is, what are the global ordering requirements
of per-cpu memory op queues, given the following basic ops:

i) write memory (via MMU_NORMAL_PT_UPDATE, MMU_MACHPHYS_UPDATE)
ii) read memory
via:
MMUEXT_PIN_L1_TABLE
MMUEXT_PIN_L2_TABLE
MMUEXT_PIN_L3_TABLE
MMUEXT_PIN_L4_TABLE
MMUEXT_UNPIN_TABLE
MMUEXT_NEW_BASEPTR
MMUEXT_TLB_FLUSH_LOCAL
MMUEXT_INVLPG_LOCAL
MMUEXT_TLB_FLUSH_MULTI
MMUEXT_INVLPG_MULTI
MMUEXT_TLB_FLUSH_ALL
MMUEXT_INVLPG_ALL
MMUEXT_FLUSH_CACHE
MMUEXT_NEW_USER_BASEPTR

(ie; anything that will cause the processor to re-read data updated
via another cpu (via, for eg: pte update with i), above)

There's two ways I can think of fixing this:

a) *before* queueing a local read-op, synchronously flush queues on all
   other CPUs via ipis. This is slightly racy, but can be done, I think.

   An optimisation for invlpg could be to implement a scoreboard that
   watches mem. locations that have been queued for update on any
   cpu. Scan through the scoreboard for the memory range we're
   invlpg-ing. If it's not there, there's no need to flush any queues on
   other cpus.

b) read-ops wait on a global condvar. If it's set, a write-op that needs
   flushing is pending. Wait (with optional timeout and ipi-nudge)
   until the remote queue is flushed. When flushing a queue, send a
   cv_broadcast to any waiters.

Option b) is slightly better than my current scheme which is to lock
any and all mmu-ops and operate the queues synchronously (via
XENDEBUG_SYNC).

I cannot think of anything else, other than ad-hoc locking + queue
flushing, which could be hard to maintain and debug in the long run.


 I'm thinking that it might be easier and more justifiable to
 nuke the current queue scheme and implement shadow page tables,
 which would fit more naturally and efficiently with CAS pte
 updates, etc.
 
 I'm not sure this would completely fis the issue: with shadow
 page tables you can't use a CAS to assure atomic operation with
 the hardware TLB, as this is, precisely, a shadow PT and not the
 one used by hardware.

Definitely worth looking into, I imho. I'm not very comfortable with
the queue based scheme for MP.

the CAS doesn't provide any guarantees with the TLB on native h/w,
afaict. If you do a CAS pte update, and the update succeeded, it's a
good idea to invalidate + shootdown anyway (even on baremetal).

Do let me know your thoughts,

Cheers,
-- 
Cherry


Re: CVS commit: src/sys/arch/xen

2011-08-29 Thread Cherry G . Mathew
Hi Manuel,

 Manuel == Manuel Bouyer bou...@antioche.eu.org writes:


[...]

 
 Xen's TLB_FLUSH operation is synchronous, and doesn't require an
 IPI (within the domain), which makes the queue ordering even more
 important (to make sure that stale ptes are not reloaded before
 the per-cpu queue has made progress). Yes, we can implement a
 roundabout ipi driven queueflush + tlbflush scheme(described
 below), but that would be performance sensitive, and the basic
 issue won't go away, imho.
 
 Let's stick to the xpq ops for a second, ignoring out-of-band
 reads (for which I agree that your assertion, that locking needs
 to be done at a higher level, holds true).
 
 The question here, really is, what are the global ordering
 requirements of per-cpu memory op queues, given the following
 basic ops:
 
 i) write memory (via MMU_NORMAL_PT_UPDATE, MMU_MACHPHYS_UPDATE)
 ii) read memory via: MMUEXT_PIN_L1_TABLE MMUEXT_PIN_L2_TABLE
 MMUEXT_PIN_L3_TABLE MMUEXT_PIN_L4_TABLE MMUEXT_UNPIN_TABLE

Manuel This is when adding/removing a page table from a pmap. When
Manuel this occurs, the pmap is locked, isn't it ?

 MMUEXT_NEW_BASEPTR MMUEXT_NEW_USER_BASEPTR

Manuel This is a context switch

 MMUEXT_TLB_FLUSH_LOCAL MMUEXT_INVLPG_LOCAL MMUEXT_TLB_FLUSH_MULTI
 MMUEXT_INVLPG_MULTI MMUEXT_TLB_FLUSH_ALL MMUEXT_INVLPG_ALL
 MMUEXT_FLUSH_CACHE

Manuel This may, or may not, cause a read. This usually happens
Manuel after updating the pmap, and I guess this also happens with
Manuel the pmap locked (I have not carefully checked).

Manuel So couldn't we just use the pmap lock for this ?  I suspect
Manuel the same lock will also be needed for out of band reads at
Manuel some point (right now it's protected by splvm()).

I'm a bit confused now - are we assuming that the pmap lock protects the
(pte update op queue-push(es) + pmap_pte_flush()) as a single atomic
operation (ie; no invlpg/tlbflush or out-of-band-read can occur between
the update(s) and the pmap_pte_flush()) ?

If so, I think I've slightly misunderstood the scope of the mmu queue
design - I assumed that the queue is longer-lived, and the sync points
(for the queue flush) can span across pmap locking - a sort of lazy pte
update, with the queue being flushed at out-of-band or in-band read
time ( I guess that won't work though - how does one know when the
hardware walks the page table ?) . It seems that the queue is meant for
pte updates in loops, for eg:, quickly followed by a flush. Is this
correct ? 

If so, there's just one hazard afaict - the synchronous TLB_FLUSH_MULTI
could beat the race between the queue update and the queue flush via
pmap_tlb_shootnow() (see pmap_tlb.c on the cherry-xenmp branch, and *if*
other CPUs reload their TLBs before the flush, they'll have stale info.

So the important question (rmind@ ?) is, is pmap_tlb_shootnow()
guaranteed to be always called with the pmap lock held ?

In real life, I removed the global xpq_queue_lock() and the pmap was
falling apart. So a bit of debugging ahead.

 [...]
 
  I'm thinking that it might be easier and more justifiable to
  nuke the current queue scheme and implement shadow page
 tables,  which would fit more naturally and efficiently with
 CAS pte  updates, etc.
  
  I'm not sure this would completely fis the issue: with shadow
  page tables you can't use a CAS to assure atomic operation
 with  the hardware TLB, as this is, precisely, a shadow PT and
 not the  one used by hardware.
 
 Definitely worth looking into, I imho. I'm not very comfortable
 with the queue based scheme for MP.
 
 the CAS doesn't provide any guarantees with the TLB on native
 h/w, afaict.

Manuel What makes you think so ? I think the hw TLB also does CAS
Manuel to update referenced and dirty bits in the PTE, otherwise we
Manuel couldn't rely on these bits; this would be bad especially
Manuel for the dirty bit.

Yes, I missed that one (which is much of the point of the CAS in the
first place!), you're right.

 If you do a CAS pte update, and the update succeeded, it's a good
 idea to invalidate + shootdown anyway (even on baremetal).

Manuel Yes, of course inval is needed after updating the PTE. But
Manuel using a true CAS is important to get the refereced and dirty
Manuel bits right.

I haven't looked into this in detail, but I thought that xen disconnects
the shadow (presumably with correct update/dirty bits)  and flushes the
TLB when a write to the shadow occurs ? In which case these bits will be
in sync until the next h/w access ? I'm speculating, I haven't looked at
the xen code for this yet.

-- 
Cherry


Re: CVS commit: src/sys/arch/xen

2011-08-21 Thread Cherry G . Mathew
 JM == Jean-Yves Migeon jeanyves.mig...@free.fr writes:

JM On 21.08.2011 12:26, Jean-Yves Migeon wrote:
 - second, the lock is not placed at the correct abstraction level
 IMHO, it is way too high in the caller/callee hierarchy. It
 should remain hidden from most consumers of the xpq_queue API,
 and should only be used to protect the xpq_queue array together
 with its counters (and everything that isn't safe for all memory
 operations happening in xpq).
 

I agree - part of the reason I did this was to make sure I didn't mess
with the current queueing scheme before having a working
implementation/PoC.

 Reason behind this is that your lock protects calls to hypervisor
 MMU operations, which are hypercalls (hence, a slow operation
 with regard to kernel). You are serializing lots of memory
 operations, something that should not happen from a performance
 point of view (some may take a faire amount of cycles to
 complete, like TLB flushes). I'd expect all Xen MMU hypercalls to
 be reentrant.

I agree - it's not meant to be an efficient implementation - just a
correct one.

JM An alternative would be to have per-CPU xpq_queue[] also. This
JM is not completely stupid, xpq_queue is meant as a way to put
JM multiple MMU operations in a queue asynchronously before issuing
JM only one hypercall to handle them all.

This is slightly more complicated than it appears. Some of the ops in
a per-cpu queue may have ordering dependencies with other cpu queues,
and I think this would be hard to express trivially. (an example would
be a pte update on one queue, and reading the same pte read on another
queue - these cases are quite analogous (although completely unrelated)
to classic RAW and other ordering dependencies in out-of-order execution
scenarios due to pipelining, etc.).

I'm thinking that it might be easier and more justifiable to nuke the
current queue scheme and implement shadow page tables, which would fit
more naturally and efficiently with CAS pte updates, etc.

Otherwise, we'll have to re-invent scoreboarding or other dynamic
scheduling tricks - I think that's a bit overkill :-)

Cheers,
-- 
Cherry


Re: CVS commit: src/sys/arch

2011-08-10 Thread Cherry G. Mathew
On 10 August 2011 13:39, Cherry G. Mathew che...@netbsd.org wrote:
 Module Name:    src
 Committed By:   cherry
 Date:           Wed Aug 10 11:39:46 UTC 2011

 Modified Files:
        src/sys/arch/amd64/amd64: fpu.c machdep.c
        src/sys/arch/i386/isa: npx.c
        src/sys/arch/x86/x86: x86_machdep.c
        src/sys/arch/xen/conf: files.xen
        src/sys/arch/xen/include: intr.h
 Added Files:
        src/sys/arch/xen/x86: xen_ipi.c

 Log Message:
 xen ipi infrastructure

Hi,

I'd appreciate feedback specifically for multiuser boot of
i386/conf/XEN3* please, with this change in place.

Many Thanks,
-- 
~Cherry


Re: CVS commit: [cherry-xenmp] src/sys/arch/xen/x86

2011-06-27 Thread Cherry G. Mathew
On 27 June 2011 12:23, Cherry G. Mathew che...@netbsd.org wrote:
 Module Name:    src
 Committed By:   cherry
 Date:           Mon Jun 27 10:23:21 UTC 2011

 Modified Files:
        src/sys/arch/xen/x86 [cherry-xenmp]: x86_xpmap.c

 Log Message:
 Add xpq locking around xpq-QUEUE_TLB_FLUSH()


sorry, that was meant to be:
Add xpq locking around xpq_queue_tlb_flush()

cvs surgery done.

Thanks,

-- 
~Cherry


Re: CVS commit: [cherry-xenmp] src/sys/arch/xen

2011-06-26 Thread Cherry G. Mathew

 On 6/26/2011 3:00 PM, Jukka Ruohonen wrote:

On Sun, Jun 26, 2011 at 12:56:33PM +, Cherry G. Mathew wrote:

Module Name:src
Committed By:   cherry
Date:   Sun Jun 26 12:56:33 UTC 2011

Modified Files:
src/sys/arch/xen/include [cherry-xenmp]: intr.h
src/sys/arch/xen/xen [cherry-xenmp]: hypervisor.c

Log Message:
Unbreak uniprocessor build

I've asked this privately few times, but what -- if any -- is the rationale
of #ifdef MULTIPROCESSOR in 2011 (for x86 MD, at least)?



This has to do with the way the Xen domU kernel probes APs. In the UP 
case, one assumes that at least one cpu exists ( :-) ), and the attach 
succeeds. In the MP case, there is no way to figure out the number of 
configured number of vcpus, since xenstore is not accessible at that 
point ( see comments within the file). So we brute force, by telling 
Xen to bring up as many cpus as it lets us. In the case of domU, Xen 
will allow upto a maximum number of vcpus specified in the VM config 
file. The reason I've put hypervisor_vcpu_print() within the #ifdef is 
that it needs to be quiet, since there's no way to know if the probe 
succeed until the vcpu_match() function returns.


So in short - it's a hack. I agree that the #ifdef should go if possible.

Cherry

--
Arms, drugs and spirituality – these are the three big businesses in the 
world ~ Javed Akhtar


Re: CVS commit: src/sys/arch/xen/xen

2011-04-18 Thread Cherry G. Mathew
On 18 April 2011 04:21, Mindaugas Rasiukevicius rm...@netbsd.org wrote:
 Masao Uebayashi uebay...@tombi.co.jp wrote:
 On Mon, Apr 18, 2011 at 03:04:32AM +, Mindaugas Rasiukevicius wrote:
  Module Name:        src
  Committed By:       rmind
  Date:               Mon Apr 18 03:04:31 UTC 2011
 
  Modified Files:
      src/sys/arch/xen/xen: balloon.c
 
  Log Message:
  balloon_xenbus_attach: use KM_SLEEP for allocation.
 
  Note: please do not use KM_NOSLEEP.

 And, according to yamt@, KM_SLEEP can fail in the current design...

 IIRC yamt@ fixed it a year or few ago.


And in the more specific immediate context:
http://mail-index.netbsd.org/port-xen/2011/04/07/msg006613.html


-- 
~Cherry


Re: CVS commit: src/sys/arch/xen/xen

2011-04-18 Thread Cherry G. Mathew
On 18 April 2011 08:00, Cherry G. Mathew cherry.g.mat...@gmail.com wrote:
 On 18 April 2011 04:21, Mindaugas Rasiukevicius rm...@netbsd.org wrote:
 Masao Uebayashi uebay...@tombi.co.jp wrote:
 On Mon, Apr 18, 2011 at 03:04:32AM +, Mindaugas Rasiukevicius wrote:
  Module Name:        src
  Committed By:       rmind
  Date:               Mon Apr 18 03:04:31 UTC 2011
 
  Modified Files:
      src/sys/arch/xen/xen: balloon.c
 
  Log Message:
  balloon_xenbus_attach: use KM_SLEEP for allocation.
 
  Note: please do not use KM_NOSLEEP.

 And, according to yamt@, KM_SLEEP can fail in the current design...

 IIRC yamt@ fixed it a year or few ago.


 And in the more specific immediate context:
 http://mail-index.netbsd.org/port-xen/2011/04/07/msg006613.html



Hi,

PS: Can you please revert ? Unless it's breaking anything else, or you
have a fix for the problem mentioned in the thread above, ie;

Cheers,
-- 
~Cherry


Re: CVS commit: src/sys/arch/xen/xen

2011-04-18 Thread Cherry G. Mathew
On 18 April 2011 09:04, Jean-Yves Migeon jeanyves.mig...@free.fr wrote:
 On 18.04.2011 09:23, Cherry G. Mathew wrote:
 On 18 April 2011 08:00, Cherry G. Mathew cherry.g.mat...@gmail.com wrote:
 On 18 April 2011 04:21, Mindaugas Rasiukevicius rm...@netbsd.org wrote:
 Masao Uebayashi uebay...@tombi.co.jp wrote:
 On Mon, Apr 18, 2011 at 03:04:32AM +, Mindaugas Rasiukevicius wrote:
 Module Name:        src
 Committed By:       rmind
 Date:               Mon Apr 18 03:04:31 UTC 2011

 Modified Files:
     src/sys/arch/xen/xen: balloon.c

 Log Message:
 balloon_xenbus_attach: use KM_SLEEP for allocation.

 Note: please do not use KM_NOSLEEP.

 And, according to yamt@, KM_SLEEP can fail in the current design...

 IIRC yamt@ fixed it a year or few ago.

 And in the more specific immediate context:
 http://mail-index.netbsd.org/port-xen/2011/04/07/msg006613.html

 PS: Can you please revert ? Unless it's breaking anything else, or you
 have a fix for the problem mentioned in the thread above, ie;

 Uho, I forgot to mention in my commit log that I fixed it. I am
 allocating bpg_entries via pool_cache(9), and the constructor
 bpge_ctor() will return an error if uvm(9) fails to find a free page. In
 that case, the thread will just bail out and start waiting again.

Ah right, I missed your commit, sorry.

Thanks,

-- 
~Cherry