Re: CVS commit: src/sys/arch/x86/x86

2015-10-09 Thread Jean-Yves Migeon
Le 09/10/2015 06:49, Masanobu SAITOH a écrit :
> On 2015/10/06 6:10, Jean-Yves Migeon wrote:
>>> Log Message:
>>> kmem_free() the address returned by kmem_alloc().  found by Brainy.
>>> use the newly aligned location if we needed it.  found by kre.
>>>
>>>
>>> To generate a diff of this commit:
>>> cvs rdiff -u -r1.8 -r1.9 src/sys/arch/x86/x86/cpu_ucode_intel.c
>>
>> IMHO this should be pulled-up to -6 and -7.
>>
>> Any argument against? If the old code worked, it's pure luck.
> 
>  netbsd-6 doesn't support the microcode update function for Intel
> CPU. That bug should be pulled up to netbsd-7 (and netbsd-7-1) branch.

Makes sense. I'll check and ask for this pullup.

-- 
Jean-Yves Migeon


Re: CVS commit: src/sys/arch/x86/x86

2015-10-05 Thread Jean-Yves Migeon
Le 04/10/2015 19:52, matthew green a écrit :
> Module Name:  src
> Committed By: mrg
> Date: Sun Oct  4 17:52:50 UTC 2015
> 
> Modified Files:
>   src/sys/arch/x86/x86: cpu_ucode_intel.c
> 
> Log Message:
> kmem_free() the address returned by kmem_alloc().  found by Brainy.
> use the newly aligned location if we needed it.  found by kre.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.8 -r1.9 src/sys/arch/x86/x86/cpu_ucode_intel.c

IMHO this should be pulled-up to -6 and -7.

Any argument against? If the old code worked, it's pure luck.

-- 
Jean-Yves Migeon


Re: CVS commit: src/sys/arch/xen

2012-06-24 Thread Jean-Yves Migeon
On 24.06.2012 23:44, Christoph Egger wrote:
> On 24.06.12 20:31, Jean-Yves Migeon wrote:
> 
>> Module Name:src
>> Committed By:jym
>> Date:Sun Jun 24 18:31:53 UTC 2012
>>
>> Modified Files:
>> src/sys/arch/xen/include: xenpmap.h
>> src/sys/arch/xen/x86: xen_pmap.c
>>
>> Log Message:
>> Enable the map/unmap recursive mapping functions for all Xen ports for
>> save/restore.
>>
>> For an unknown reason (to me) Xen refuses to update VM translations
>> when the entry is pointing back to itself (which is precisely
>> what our recursive VM model does). So enable the functions that take
>> care of this, which will avoid all sort of memory corruption upon restore
>> leading domU to trample upon itself.
> 
> Please report this Xen behaviour to xen-devel.

It has always been a tricky step anyway; it already happened with PAE.

I am not even sure that Xen is at fault. IIRC the translations fixup
step is done by the low level tools upon restore, not the hypervisor
directly.

IMHO correcting the fixup step will never be possible nor perfect
anyway; save operation is like a snapshot: it loses all the history of
how the mappings were created. It can get accepted by hypervisor when
you validate it in a given order, and not when done it is done in another.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/x86/x86

2012-05-16 Thread Jean-Yves Migeon

Le 16/05/12 10:45, Christoph Egger a écrit :

On 05/13/12 13:24, Martin Husemann wrote:


On Sun, May 13, 2012 at 01:04:15PM +0200, Jean-Yves Migeon wrote:

Are you sure that moving to low priority xcalls is the way to go? You
can end up with CPUs not being updated because they are offline.


Curiously, while I could reproduce the crash before this commit, I am
unable to reproduce it in any testing without the actual ucode update
happening - and I can not spot a bug in the xcall code that tries to make
sure the number of cpus that did run the callback is == the expected count
before returning.

This clearly needs full analyzis.


I am pleased to revert this change once this xcall(9) issue has been
fixed.


Sure, however I can't see where the xcall(9) code goes wrong. Care to 
give more details, please? I cannot reproduce it on my side.


I am using xcall(9) to flush CPU-bound pool caches and having this sort 
of bug can definitely cause serious cache incoherency that are hard to 
track down afterwards.


Is it specific to high priority xcalls?

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/x86/x86

2012-05-13 Thread Jean-Yves Migeon

Le 13/05/12 13:24, Martin Husemann a écrit :

On Sun, May 13, 2012 at 01:04:15PM +0200, Jean-Yves Migeon wrote:

Are you sure that moving to low priority xcalls is the way to go? You
can end up with CPUs not being updated because they are offline.


Curiously, while I could reproduce the crash before this commit, I am
unable to reproduce it in any testing without the actual ucode update
happening - and I can not spot a bug in the xcall code that tries to make
sure the number of cpus that did run the callback is == the expected count
before returning.

This clearly needs full analyzis.


Any quick pointer on the crash in question so I can take a look? I 
probably missed it on port-i386/amd64.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/x86/x86

2012-05-13 Thread Jean-Yves Migeon

Le 10/05/12 14:35, Christoph Egger a écrit :

+   /* Check for race:
+* When we booted with -x a cpu might already finished
+* (why doesn't xc_wait() wait for *all* cpus?)
+* and sc->sc_blob is already freed.
+* In this case the calculation below touches
+* non-allocated memory and results in a crash.
+*/


xc_wait should wait for all CPUs, except low priority xcalls and offline 
CPUs.


Are you sure that moving to low priority xcalls is the way to go? You 
can end up with CPUs not being updated because they are offline.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/x86/include

2012-05-05 Thread Jean-Yves Migeon

Le 05/05/12 17:08, Jean-Yves Migeon a écrit :

Module Name:src
Committed By:   jym
Date:   Sat May  5 15:08:29 UTC 2012

Modified Files:
src/sys/arch/x86/include: specialreg.h

Log Message:
Add latest CR4 bits:

[snip]

 From Intel® 64 and IA-32 Architectures Software Developer’s Manual,
March 2012.

Align declarations.

CPUID_* bits for these features follow.


Sorry, I was not very clear: the CPUID feature flags follow these 
declarations, ie. they are already present in headers (and can be used 
to check the feature's availability).


CR4 values can be used to enable them when needed.

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/tests/lib/libc/gen

2012-04-23 Thread Jean-Yves Migeon

On Mon, 23 Apr 2012 19:02:04 +0100, David Laight wrote:

On Mon, Apr 23, 2012 at 10:09:09AM +0200, Jean-Yves Migeon wrote:

>
> But, as has been pointed out before, code in libc will generate
> alignment traps - because it is faster that way.

I did not know -- care to give an example? #AC exception from amd64 
was
not working, so I would guess that this was not used for this port 
at least.


memcpy() between misaligned addresses will do misaligned reads, 
writes

or both. (aligning reads is best on amd cpus)

memset() will do an unaligned store to the last word of the buffer
prior to filling the rest of the buffer. (IIRC memcpy/memmove do 
similar.)


Any access to a packed structure relies on misaligned transfers.
eg: the 'sector number' fields in the disk mbr.
in 64bit mode any code that has to match the alignment of
64bit items from 32bit code.

Basically, on i386 and amd64 (and other cpus that allow misaligned
accesses) if a buffer has to match an external representation
that contains misaligned items, then misaligned tranfers are very
likely to be used.

So although the cpu supports the trap, the os doesn't expect
it to be enabled.


I see; thanks for the details. That explains why this feature is rarily 
seen or used on x86.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/tests/lib/libc/gen

2012-04-23 Thread Jean-Yves Migeon

Le 23/04/12 09:48, Martin Husemann a écrit :

On Sun, Apr 22, 2012 at 08:25:11PM +, Christos Zoulas wrote:

There is no portable way to do this; sigbus according to ToG does not
define what si_addr contains.


I read it differently:

The  header shall define the siginfo_t type as a structure,
which shall include at least the following members:

int   si_signo  Signal number.
int   si_code   Signal code.
int   si_errno  If non-zero, an errno value associated with
 this signal, as described in.
pid_t si_pidSending process ID.
uid_t si_uidReal user ID of sending process.
void *si_addr   Address of faulting instruction.
int   si_status Exit value or signal.
long  si_band   Band event for SIGPOLL.
union sigval  si_value  Signal value.


This is from:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html


I think we should modify the test to use RAS_START() and RAS_END() to verify
the faulting code address and ignore the associated data address.


Careful here; Some Other Operating Systems report data address fault 
with si_addr + SIGBUS, not instruction fault.


--
jym@


Re: CVS commit: src/tests/lib/libc/gen

2012-04-23 Thread Jean-Yves Migeon

Le 22/04/12 18:38, David Laight a écrit :
> On Sun, Apr 22, 2012 at 02:22:30PM +0100, Jean-Yves Migeon wrote:
>>
>> Hmm, siginfo(2) states the following:
>>
>>   For SIGBUS and SIGSEGV the siginfo structure contains the
>> following additional members:
>>
>> void *si_addr;
>> int si_trap;
>>
>>   si_addr contains the address of the faulting data
>
> That certainly needs an 'if available' comment.

I will add it later.

>>   and si_trap contains a hardware specific reason.
>
> But, as has been pointed out before, code in libc will generate
> alignment traps - because it is faster that way.

I did not know -- care to give an example? #AC exception from amd64 was 
not working, so I would guess that this was not used for this port at least.


--
jym@


Re: CVS commit: src/tests/lib/libc/gen

2012-04-22 Thread Jean-Yves Migeon

On Sun, 22 Apr 2012 08:52:26 +, Martin Husemann wrote:

Module Name:src
Committed By:   martin
Date:   Sun Apr 22 08:52:26 UTC 2012

Modified Files:
src/tests/lib/libc/gen: t_siginfo.c

Log Message:
Do not compare si_addr (address of faulting instruction) against the
unaligned data address causing the fault - this will always fail.
If anybody knows a portable way to get the data address involved in 
the

fault, please fix the test case as originally intended.


Hmm, siginfo(2) states the following:

 For SIGBUS and SIGSEGV the siginfo structure contains the 
following addi-

 tional members:

   void *si_addr;
   int si_trap;

 si_addr contains the address of the faulting data and si_trap 
contains a

 hardware specific reason.


If the address of the instruction is returned and not the address of 
the faulting data, either the man page is incorrect, or ambiguous.


Concerning x86 (Christoph can perhaps confirm here), I am not aware of 
any CPU that report the faulty address; rather, %cr2 contains the 
address of the fault that happened just before, typically a page fault. 
IMHO this is one of the reason why the mechanism never got much traction 
under x86: it's unreliable.


Reliably obtaining the address would require to fetch eip/rip and 
decode the faulty opcode. Not impossible, but do we want that?


Regarding other architectures, I do not know. I would expect the 
si_addr to contain the address of the unaligned data access, not the one 
from the faulting instruction. SIGBUS is used, not SIGILL.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/amd64/amd64

2012-04-21 Thread Jean-Yves Migeon

Le 21/04/12 23:25, Christos Zoulas a écrit :

In article<4f930a8c.6040...@free.fr>,
Jean-Yves Migeon  wrote:

Le 21/04/12 20:52, Christos Zoulas a écrit :

Module Name:src
Committed By:   christos
Date:   Sat Apr 21 18:52:37 UTC 2012

Modified Files:
src/sys/arch/amd64/amd64: vector.S

Log Message:
Alignment fault traps push the error code automatically, so don't use

ZTRAP!

Meh, the fix was awaiting Paul testing... Alright, so I guess this one
is right.


Even if Paul's testing discovered that the fix did not work for the emulator,
wouldn't you commit it so that at least things work on real hardware?


It's the other way around; the bug was rather harmless in VMs (kills the 
process with a SIGILL), while it force-reboot the host on a native platform.


I could not know that the fix works on real hardware, that's why I was 
waiting for Paul's response.



Do you want me to ask for a pull-up?


Sure, thanks.


Will do.

--
jym@


Re: CVS commit: src/sys/arch/amd64/amd64

2012-04-21 Thread Jean-Yves Migeon

Le 21/04/12 20:52, Christos Zoulas a écrit :
> Module Name:   src
> Committed By:  christos
> Date:  Sat Apr 21 18:52:37 UTC 2012
>
> Modified Files:
>src/sys/arch/amd64/amd64: vector.S
>
> Log Message:
> Alignment fault traps push the error code automatically, so don't use 
ZTRAP!


Meh, the fix was awaiting Paul testing... Alright, so I guess this one 
is right.


Do you want me to ask for a pull-up?

--
jym@


Re: CVS commit: src/tests/lib/libc/gen (address alignment)

2012-04-21 Thread Jean-Yves Migeon

Le 21/04/12 19:47, Christoph Egger a écrit :
>> rip 0x0 and rsp 0x50202 look really abnormal to me. I'll have a look in
>> FreeBSD, that's probably a group of exceptions that have to be handled
>> differently.
>
> rip 0x0 often means that a function pointer has been called which is 
NULL.

>
> Christoph

Yep, but the bug seems to be a displaced stack here; the information is 
pushed correctly, but with an offset. Looking at FreeBSD interrupt code, 
some exceptions have the tf_err value already pushed by the CPU, so no 
need to do it twice.


I have sent a small patch to Paul for testing, it fixes the bug in my 
VM. Hope that this fixes the bug natively too.


--
jym@


Re: CVS commit: src/tests/lib/libc/gen (address alignment)

2012-04-21 Thread Jean-Yves Migeon

Le 21/04/12 16:31, Jean-Yves Migeon a écrit :

Okay, thanks for the report. So this rules out Virtual Box, it seems to
happen on native amd64 too.

I am taking a look right now.


This seems to be a bug in the trap handling code. The signal is caught
correctly (it reaches T_ALIGNFLT|T_USER in trap()), but things blow up
just after: we end signalling the process with a SIGILL (which does not
come from trap()).

Using 32 bits compat mode (cc -m 32) also causes the crash. So something
in e_trapsignal() or userret() goes wrong. Still digging.


Interesting. The "uncatchable" SIGILL comes from sendsig_siginfo [1] and 
is used in a way that the signal will force-exit the program (when the 
copyout fails). It's a "feature" I was not aware of.


Anyway, upon entering trap() for this exception (trap11), it looks like 
the trapframe is a mess:


pid 470 (malign): BUS/SEGV (107) at rip 0 addr 7f7ff7701020
rip 0x0 rsp 0x50202 rfl 0x1f
rdi 0x7f7ff7701080 rsi 0x8 rdx 0xf7701078
rcx 0x0 r8 0x7 r9 0x7f7ff770178
[...]

rip 0x0 and rsp 0x50202 look really abnormal to me. I'll have a look in 
FreeBSD, that's probably a group of exceptions that have to be handled 
differently.


Given the stack dump above, I tend to believe that #AC was never really 
stressed.


[1] http://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/machdep.c#695

--
jym@


Re: CVS commit: src/tests/lib/libc/gen (address alignment)

2012-04-21 Thread Jean-Yves Migeon

Le 21/04/12 14:50, Jean-Yves Migeon a écrit :

The machine did not drop into ddb, it simply rebooted. Unfortunately
it did not leave a core dump behind, so I don't have much to look at
just yet. When I get home later today, I will try to get more info.

BTW, this occurred while running the ATF test from a non-privileged
user, so if there's a bug lurking in these recent changes, it could be
considered to be a security vulnerability - non-priv user should not
be able to crash the box...

:)


Okay, thanks for the report. So this rules out Virtual Box, it seems to
happen on native amd64 too.

I am taking a look right now.


This seems to be a bug in the trap handling code. The signal is caught 
correctly (it reaches T_ALIGNFLT|T_USER in trap()), but things blow up 
just after: we end signalling the process with a SIGILL (which does not 
come from trap()).


Using 32 bits compat mode (cc -m 32) also causes the crash. So something 
in e_trapsignal() or userret() goes wrong. Still digging.


--
jym@


Re: CVS commit: othersrc/share/examples/ec2

2012-04-21 Thread Jean-Yves Migeon

Le 20/04/12 23:01, Jeff Rizzo a écrit :

Module Name:othersrc
Committed By:   riz
Date:   Fri Apr 20 21:01:03 UTC 2012

Added Files:
othersrc/share/examples/ec2: README TODO build_ec2_img.sh create-new.sh
create_ec2_ami.sh set_ec2_env.sh
othersrc/share/examples/ec2/files: bootstrap.sh ec2_bootmail
ec2_firstboot ec2_init motd spec.in

Log Message:
Example scripts and supporting files for creating Amazon EC2 AMIs
(Amazon Machine Images) running NetBSD.  Requires an Amazon EC2 account.
Please test and extend these scripts!

The original scripts were written by Jean-Yves Migeon, heavily modified


Thanks for your work on this, riz. Heavily appreciated.

--
jym@


Re: CVS commit: src/tests/lib/libc/gen (address alignment)

2012-04-21 Thread Jean-Yves Migeon

On Fri, 20 Apr 2012 15:33:05 -0700 (PDT), Paul Goyette wrote:

On Fri, 20 Apr 2012, Paul Goyette wrote:

XXX I would appreciate if someone could test it under a real amd64 
host with an up-to-date kernel, so I can reasonably assume that the 
culprit is Virtual Box and not our amd64 port (my test machine being 
off line I cannot do it myself). Results from other arches would be a 
plus too.


I just updated one of my machines (running a bulldozer FX-8150) to 
-current from an hour ago (with the most up-to-date rev of psl.h) and 
ran atf-run on the libc/gen tests.  The machine hung solid at this 
point:


...
t_siginfo (28/32): 8 test cases
sigalarm: [1.006950s] Passed.
sigbus_adraln:

and I am no longer able to telnet into the box from elsewhere.  I am 
not physically at the machine, so I cannot tell if it has dropped into 
ddb.



The machine did not drop into ddb, it simply rebooted.  Unfortunately
it did not leave a core dump behind, so I don't have much to look at
just yet.  When I get home later today, I will try to get more info.

BTW, this occurred while running the ATF test from a non-privileged
user, so if there's a bug lurking in these recent changes, it could 
be

considered to be a security vulnerability - non-priv user should not
be able to crash the box...

:)


Okay, thanks for the report. So this rules out Virtual Box, it seems to 
happen on native amd64 too.


I am taking a look right now.

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src or a tale on NetBSD/usermode

2011-12-24 Thread Jean-Yves Migeon
o plant a "int 0x80" in a MMAP_NOSYSCALLS
>> executable region, just make it to a "call __syscall".  At the expense of a
>> few more arguments, you will get the same result.
> 
> It depends on the implementation. Do you f.e. allow the linkage of this code
> to functions outside a designated list, or outside a designated area? If it
> manages to find __syscall by itself in its host program and patch up a direct
> call to that constant then yes it could call the OS. Static linking and
> strip(1) is your friend then.
>
> In NetBSD/usermode it would then still only be
> able to call the NetBSD/usermode kernel and not the host kernel.

I think that this needs clarification. Please correct my mistakes:
- NetBSD/usermode kernel is started as a userland process, and uses
POSIX API to setup its environment.
- it then proceeds to setting the userland memory regions with
MMAP_NOSYSCALLS flags, so userland cannot make direct syscalls to host
kernel
- passes execution to init(1) and userland
- all userland code making direct syscalls, this raises a SIGILL each
time which gives a chance to the usermode kernel to handle the userland
syscalls.

Right? So how can you implement the MMAP_NOSYSCALLS step on other POSIXy
systems?

Merry Christmas to you (and everyone else too) :)

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src

2011-12-21 Thread jean-Yves Migeon

On Wed, 21 Dec 2011 16:47:49 +0100, Reinoud Zandijk wrote:
The patch is written to allow for multiple non-UVM flags to be 
attached to
mappings and allow the kernel to react on them. NetBSD/usermode uses 
this to
disallow system calls to be made from within mapped regions and get 
them
returned as illegal instructions so it can analyse and emulate the 
system
calls. To prevent every process to be scrutinized this way a process 
flag has
been introduced to mark if a process needs this check since the 
detection

involve acuiring a lock to walk the uvm map.


Why make this a memory-level property, and not a process-level 
property? If you want to proxy syscalls between host and usermode 
kernel, why make it exclusive to certain mem regions? I am probably 
missing something with the way usermode processes, usermode kernel host 
kernel interact.


On the enhancing security argument, malicious source code could 
trigger
compiler bugs that allow for code to be modified or otherwise 
manipulated to
issue system calls where they shouldn't. Although it wouldn't 
nessiarily pose

a system security issue, it could be used for extracting info or for
malicious behaviour where with the patch it would simply bomb out.


That's the part I have trouble with. It looks like a weaker form of W^X 
(or PaX's mprotect), and I can't see the "additional" security benefits.


Malicious code is free to trigger compiler bugs that can make calls to 
valid memory areas. If you manage to plant a "int 0x80" in a 
MMAP_NOSYSCALLS executable region, just make it to a "call __syscall". 
At the expense of a few more arguments, you will get the same result.


As for the panic in sys_mmap(), as pointed out by Joerg and David 
Young, yes,

that should return a EOPNOTSUPP or an EINVAL. Panicing is indeed far
too crude
and i'll change that.

Hope this answers most of your questions.


Waiting for mines :)

--
Jean-Yves Migeon
j...@netbsd.org


Re: CVS commit: src

2011-12-20 Thread Jean-Yves Migeon

On 20.12.2011 16:39, Reinoud Zandijk wrote:

Module Name:src
Committed By:   reinoud
Date:   Tue Dec 20 15:39:36 UTC 2011

Modified Files:
src/lib/libc/sys: mmap.2
src/sys/sys: mman.h proc.h
src/sys/uvm: uvm_extern.h uvm_map.c uvm_mmap.c

Log Message:
Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system
calls from the mapped region. This can be used for emulation perposed or for
extra security in the case of generated code.


IMHO, this change should have been discussed first.

Can you please elaborate on its usage? I fail to see the point about 
emulation, and even more so about the alleged extra security where this 
can be trivially bypassed. Return to libfoo and ROP are quite mainstream 
techniques these days...


--
Jean-Yves Migeon
j...@netbsd.org


Re: CVS commit: src/sys

2011-12-04 Thread Jean-Yves Migeon

On 04.12.2011 21:07, Alan Barrett wrote:

On Sun, 04 Dec 2011, Jean-Yves Migeon wrote:

Log Message:
Implement the register/deregister/evaluation API for secmodel(9). It
allows registration of callbacks that can be used later for
cross-secmodel "safe" communication.


Where and when was this discussed?


See commit log:

http://mail-index.netbsd.org/tech-security/2011/11/29/msg000422.html

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2011-11-20 Thread Jean-Yves Migeon

On 19.11.2011 18:13, Cherry G. Mathew wrote:

Module Name:src
Committed By:   cherry
Date:   Sat Nov 19 17:13:39 UTC 2011

Modified Files:
src/sys/arch/x86/include: cpu.h
src/sys/arch/xen/include: hypervisor.h
src/sys/arch/xen/x86: hypervisor_machdep.c
src/sys/arch/xen/xen: evtchn.c

Log Message:
[merging from cherry-xenmp] bring in bouyer@'s changes via:
http://mail-index.netbsd.org/source-changes/2011/10/22/msg028271.html
 From the Log:
Log Message:
Various interrupt fixes, mainly:
keep a per-cpu mask of enabled events, and use it to get pending events.
A cpu-specific event (all of them at this time) should not be ever masked
by another CPU, because it may prevent the target CPU from seeing it
(the clock events all fires at once for example).


This patch breaks various xenstore/xenbus related functions, like 
ballooning. Steps to reproduce:

- connect to a dom0 system
- try ballooning up or down, with:
sysctl -w machdep.xen.balloon.target=10

All newly created processes will then stay in tstile, and the sysctl 
never returns, waiting on "rplq" wait channel.


Observed in QEMU, Virtualbox, and my amd64 spare host.

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/lib/libutil

2011-11-17 Thread Jean-Yves Migeon

On 13.11.2011 23:03, Christos Zoulas wrote:

Module Name:src
Committed By:   christos
Date:   Sun Nov 13 22:03:34 UTC 2011

Modified Files:
src/lib/libutil: Makefile
Added Files:
src/lib/libutil: getfstypename.3

Log Message:
add manual page


Just wondering: is there a different rule applicable to man pages for 
4-clause vs 2-clause BSD? I occasionally see new man pages written with 
a 4-clause BSD, however, most newly written code is 2-clause.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2011-11-09 Thread Jean-Yves Migeon

On Wed, 9 Nov 2011 07:52:13 +0530, Cherry G. Mathew wrote:
The cleanest way would be to share the code between x86 and Xen, 
keep
the allocation "below 4GiB" boundary for both, and use it everywhere 
in

the pmap code. Only the manipulation of the vcpu_guest_context_t
ctrlregs members would have to force this use.


Fair enough. Although the <4G tests would be a bit deceptive (since
they're cosmetic) - I guess leaving a note in the code about the
rationale behind this will help.


Well, the cpu_info struct will be shared by all implementations of x86 
eventually (at least for PAE & !PAE, native & Xen). So having lots of 
#ifdef XEN/#ifdef PAE means more work for me later on :)



- are you sure that these have to be "defined(PAE) ||
defined(__x86_64__)" ?


That's a crude way of making pmap_cpu_init_late() do nothing for
i386 (non-pae) since i386 doesn't need shadow PMDs.


In that case, test against "i386_use_pae" (0 == !PAE, 1 == PAE), and
simply return if !PAE. Avoid having loonnng macro blocks with
different levels of #ifdef. It's fairly difficult to untangle and
unpleasant to read.


I agree - this  looks better.


I always wondered whether i386_use_pae should be set for amd64. 
Strictly speaking, PAE is enabled when we are running in long mode. I 
set the sysctl(7) a while ago for NX regression tests in ATF, but not 
the variable. Maybe I should.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2011-11-08 Thread Jean-Yves Migeon

On 08.11.2011 14:53, Cherry G. Mathew wrote:

On 8 November 2011 15:20, Jean-Yves Migeon

wrote:

On Tue, 8 Nov 2011 08:49:22 +0530, Cherry G. Mathew wrote:


Please keep ci_pae_l3_pdir to a uint32_t and back out its
paddr_t type.

Unless you can convince me that your code will do the right
thing(tm), the PDP for PAE should to be allocated below the
4GiB physical boundary; unless mistaken, the
uvm_km_alloc(kernel_map, ...) does not enforce that. Besides,
%cr3 is still a 32 bits entity under PAE. Trying to stash a 64
bits paddr_t in %cr3 is a really bad idea.

See comments

http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588


This doesn't hold true for Xen. See:
http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509

Xen does the<4GB translation for the VM by maintaining its own
shadow. I just took the easier route ( after initially using the
x86 pmap<  4GB cr3 code ), but if you're thinking of a unified a
xen/non-xen implementation we'll have to re-think this one.


Okay; then all users of cr3 have to be documented (_especially_
the

native

x86 code), as there at least two places where there's implicit

uint64_t =>

uint32_t truncation: lcr3() and pcb_cr3. A comment above these
should be enough. Same goes for having paddr_t for L3 PD in struct
cpu_info.



hmm... I wonder if it would be "cleaner" to do this within a  #ifdef
XEN/#endif ?


The cleanest way would be to share the code between x86 and Xen, keep
the allocation "below 4GiB" boundary for both, and use it everywhere in
the pmap code. Only the manipulation of the vcpu_guest_context_t
ctrlregs members would have to force this use.

Avoiding the use of macros is a big plus; it helps modularity.


- are you sure that these have to be "defined(PAE) ||
defined(__x86_64__)" ?



That's a crude way of making pmap_cpu_init_late() do nothing for
i386 (non-pae) since i386 doesn't need shadow PMDs.


In that case, test against "i386_use_pae" (0 == !PAE, 1 == PAE), and
simply return if !PAE. Avoid having loonnng macro blocks with
different levels of #ifdef. It's fairly difficult to untangle and
unpleasant to read.

Macros should only be used to fix compile-time issues (like: symbol
missing), or help code readability by reuse. Not for jumping around C
code and take shortcuts. This has to be done at execution time rather
than compile time.


- please use "for (i = 0; i<  PTP_LEVELS - 1; i++ ) { ... }".
I  will have
to identify these blocks later when GENERIC will include both
PAE and !PAE pmaps. PTP_LEVELS makes it a lot easier.



ok, will watchout for it - do you want to do this one yourself ?


Please do :)


ok, I'll take flak for further breakage ;-P


Expect stress testing for MP in the near future :o

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2011-11-08 Thread Jean-Yves Migeon

On Tue, 8 Nov 2011 08:49:22 +0530, Cherry G. Mathew wrote:
Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t 
type.


Unless you can convince me that your code will do the right 
thing(tm), the
PDP for PAE should to be allocated below the 4GiB physical boundary; 
unless
mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce that. 
Besides,
%cr3 is still a 32 bits entity under PAE. Trying to stash a 64 bits 
paddr_t

in %cr3 is a really bad idea.

See comments 
http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588


This doesn't hold true for Xen. See:
http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509

Xen does the <4GB translation for the VM by maintaining its own
shadow. I just took the easier route ( after initially using the x86
pmap < 4GB cr3 code ), but if you're thinking of a unified a
xen/non-xen implementation we'll have to re-think this one.


Okay; then all users of cr3 have to be documented (_especially_ the 
native x86 code), as there at least two places where there's implicit 
uint64_t => uint32_t truncation: lcr3() and pcb_cr3. A comment above 
these should be enough. Same goes for having paddr_t for L3 PD in struct 
cpu_info.



- are you sure that these have to be
"defined(PAE) || defined(__x86_64__)" ?

- please use "for (i = 0; i < PTP_LEVELS - 1; i++ ) { ... }". I will 
have to
identify these blocks later when GENERIC will include both PAE and 
!PAE

pmaps. PTP_LEVELS makes it a lot easier.



ok, will watchout for it - do you want to do this one yourself ?


Please do :)

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2011-11-07 Thread Jean-Yves Migeon

On 06.11.2011 16:18, Cherry G. Mathew wrote:
> Module Name:   src
> Committed By:  cherry
> Date:  Sun Nov  6 15:18:19 UTC 2011
>
> Modified Files:
>src/sys/arch/amd64/include: pmap.h
>src/sys/arch/i386/include: pmap.h
>src/sys/arch/x86/include: cpu.h
>src/sys/arch/x86/x86: pmap.c
>src/sys/arch/xen/x86: cpu.c x86_xpmap.c
>
> Log Message:
> [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and 
MP aware.


Some comments.

> -#ifdef PAE
> -/* Address of the static kernel's L2 page */
> -pd_entry_t *pmap_kl2pd;
> -paddr_t pmap_kl2paddr;
> -#endif
> -
> -
[snip]
>   #ifdef PAE
> -  uint32_tci_pae_l3_pdirpa; /* PA of L3 PD */
> +  paddr_t ci_pae_l3_pdirpa; /* PA of L3 PD */
>pd_entry_t *ci_pae_l3_pdir; /* VA pointer to L3 PD */
>   #endif
>
[snip]
> +
> +#if defined(PAE)
> +  ci->ci_pae_l3_pdir = (paddr_t *)uvm_km_alloc(kernel_map, PAGE_SIZE, 0,
> +  UVM_KMF_WIRED | UVM_KMF_ZERO | UVM_KMF_NOWAIT);
> +
> +  if (ci->ci_pae_l3_pdir == NULL) {
> +  panic("%s: failed to allocate L3 per-cpu PD for CPU %d\n",
> +__func__, cpu_index(ci));
> +  }

Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type.

Unless you can convince me that your code will do the right thing(tm), 
the PDP for PAE should to be allocated below the 4GiB physical boundary; 
unless mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce 
that. Besides, %cr3 is still a 32 bits entity under PAE. Trying to stash 
a 64 bits paddr_t in %cr3 is a really bad idea.


See comments http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588

> -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 
otherwise */

> -#if defined(XEN)&&  defined(__x86_64__)
> -#define PG_k PG_u
> -#else
> -#define PG_k 0
> -#endif
> -

Are you sure that all the mapping sites are safe (PT/PD bits), given the 
pmap split between pmap/xen_xpmap.c?


On a side note, please stick to KNF especially for 80 columns.

[snip]
> +void
> +pmap_cpu_init_late(struct cpu_info *ci)
> +{
> +#if defined(PAE) || defined(__x86_64__)
> +  /*
> +   * The BP has already its own PD page allocated during early
> +   * MD startup.
> +   */
> +
> +  if (ci ==&cpu_info_primary)
> +  return;
> +
> +  KASSERT(ci != NULL);

> +  ci->ci_pae_l3_pdirpa = vtophys((vaddr_t) ci->ci_pae_l3_pdir);
> +  KASSERT(ci->ci_pae_l3_pdirpa != 0);
> +
> +  /* Initialise L2 entries 0 - 2: Point them to pmap_kernel() */
> +  ci->ci_pae_l3_pdir[0] =
> +  xpmap_ptom_masked(pmap_kernel()->pm_pdirpa[0]) | PG_V;
> +  ci->ci_pae_l3_pdir[1] =
> +  xpmap_ptom_masked(pmap_kernel()->pm_pdirpa[1]) | PG_V;
> +  ci->ci_pae_l3_pdir[2] =
> +  xpmap_ptom_masked(pmap_kernel()->pm_pdirpa[2]) | PG_V;
> +#endif /* PAE */

- are you sure that these have to be
"defined(PAE) || defined(__x86_64__)" ?

- please use "for (i = 0; i < PTP_LEVELS - 1; i++ ) { ... }". I will 
have to identify these blocks later when GENERIC will include both PAE 
and !PAE pmaps. PTP_LEVELS makes it a lot easier.


That's all for the first quick review :) Thanks for starting the merge 
of your branch.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/kern

2011-10-24 Thread Jean-Yves Migeon

On 24.10.2011 03:53, YAMAMOTO Takashi wrote:

Log Message:
Turn a workqueue(9) name into an array in the struct workqueue, rather
than a const char *. This avoids keeping a reference to a string
owned by caller (string could be allocated on stack).


what needs it?


Me, in a not-so-distant future. The purpose of this change is to allow 
passing strings that could be constructed that way:


char name[16];
...
snprintf(name, sizeof(name), "xbdback%d.%d", domid, instance);
workqueue_create(&foo, name, funcfoo, NULL, PRI_NONE, SPL_BIO, 0);

I'd like to construct workqueues based on a context; the old code does 
not permit that, unless you build and store the string somewhere, 
forcing the caller to *know* that it only keeps a pointer and does not 
copy the content. This will get misused, sooner or later.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src

2011-09-29 Thread Jean-Yves Migeon

On 29.09.2011 22:05, Alexander Nasonov wrote:

Jukka Ruohonen wrote:

What are the options to handle cases like this?

While looking for examples, I also noticed few bugs where the possibly
blocking kmem_free(9) is used while holding a mutex (e.g. ras_purgeall() in
kern_ras.c).


Disclaimer: I'm a complete newbie in this area.

Blocking kmem is not necessarily a bug is you hold an adaptive lock.
Your mutex is IPL_NONE, if I remember correctly. You probably don't need
to be smart.


I am not sure that this lock will be released if the LWP needs to sleep 
for the allocation. Which means that the lock will block other LWPs from 
running even when they do not need to allocate memory.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2011-09-20 Thread Jean-Yves Migeon

On 21.09.2011 00:23, Matt Thomas wrote:

XXX PAE suspend does not work in amd64 currently, due to (yet
again!) page validation issues with hypervisor. Will fix.


Got it, pool cache invalidation is not working as expected during
save. I remember discussing this matter with Mindaugas a while
back, and due to pool_cache(9) limitations back then the code was
commented out (see the #if 0 ... #endif part in
pool_cache_invalidate).

I had to implement a pool_cache_invalidate_local(), but this wasn't
technically appreciated by many as it made pool_cache abstraction
leaky (and they were right)

I'll discuss this matter again, I think I have an alternative
solution for that one. I may hit a pmf(9) limitation though :/


Maybe add pmf hook for cpu device_t in mi_cpu_attach


That's the idea; that way I could write initial code for vcpu suspend 
when Xen MP gets committed.


Unfortunately, the limitation I currently have is for invalidation: only 
the CPU can invalidate its own caches, so I have to teach the pmf hook 
to somehow execute the "pool_cache(9) deplete" on the right CPU when 
pmf_system_suspend is called, or use xcall(9).


rmind@ implemented high priority xcall(9) some time ago, so IMHO the 
pool_cache(9) invalidation should now be able to handle all kind of 
pools situations, including high IPL pools. But I'll have to check this 
one first before uncommenting the xcall part in pool_cache_invalidate(). 
There's already MD code that tried to circumvent the invalidation 
limitation (eg. not handling per-CPU pools, like x86 pmap_create 
"try_again" loop).


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2011-09-20 Thread Jean-Yves Migeon

On 20.09.2011 02:12, Jean-Yves Migeon wrote:

Module Name:src
Committed By:   jym
Date:   Tue Sep 20 00:12:25 UTC 2011

Modified Files:

[snip]

Log Message:
Merge jym-xensuspend branch in -current. ok bouyer@.

Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.

[snip]

XXX PAE suspend does not work in amd64 currently, due to (yet again!)
page validation issues with hypervisor. Will fix.


Got it, pool cache invalidation is not working as expected during save. 
I remember discussing this matter with Mindaugas a while back, and due 
to pool_cache(9) limitations back then the code was commented out (see 
the #if 0 ... #endif part in pool_cache_invalidate).


I had to implement a pool_cache_invalidate_local(), but this wasn't 
technically appreciated by many as it made pool_cache abstraction leaky 
(and they were right)


I'll discuss this matter again, I think I have an alternative solution 
for that one. I may hit a pmf(9) limitation though :/


Anyway, for the reckless people that already tried my code and reported 
breaks: yes, I am aware of that specific issue :)


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/etc

2011-09-06 Thread Jean-Yves Migeon
On 06.09.2011 21:29, Thomas Klausner wrote:
>> If the source ('-s' of postinstall(8)) is not specified at command line,
>> according to code it takes "/usr/src" as default. Do you have a
>> stale/old rc.conf(5) file in there, /usr/src/etc/defaults/rc.conf?
> 
> I have the CVS checkout there that was used to build the snapshot I
> installed, so yes there is a file there but no it is not stale.
> 
>> Now, if this case should be handled properly is another question. There
>> are multiple solutions, all of them end up tweaking postinstall(8):
>> - raise a warning when overriding certain configuration files with older
>> versions (perhaps by using the CVS tags?)
> 
> It's not an older version.
> 
>> - make source an explicit arg to etcupdate/postinstall
>> - something else?
> 
> Or perhaps prefer ./etc to /usr/src/etc, that would solve the issue
> for me at least :)

postinstall(8) and etcupdate(8) all take /usr/src/ as default, so I
won't change that.

The problem is that rc.conf is now generated at build time, and I am
always using postinstall(8) via -s /usr/cvs/dest (my default DESTDIR),
so this went unnoticed during my tests as the correct /etc/defaults gets
parsed.

I'll add the required code to handle this; thanks for reporting!

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/etc

2011-09-06 Thread Jean-Yves Migeon
On 06.09.2011 12:54, Thomas Klausner wrote:
> On Mon, Aug 22, 2011 at 06:54:07PM +0000, Jean-Yves Migeon wrote:
>> Module Name: src
>> Committed By:jym
>> Date:Mon Aug 22 18:54:06 UTC 2011
>>
>> [snip]
>>
>> Log Message:
>> Modify etc/defaults/Makefile so that architectures can specify an additional
>> rc.conf file. This one should reside under etc/etc.${MACHINE}/, and will
>> get automatically appended to etc/defaults/rc.conf at build time if present.
>>
>>[snip]
>>
>> See also 
>> http://mail-index.netbsd.org/tech-userlevel/2011/07/25/msg005246.html
> 
> In a snapshot from a few hours ago I installed, "postinstall fix"
> without any other arguments removed the architecture specific section
> (the rc.conf included in the build output directory is fine).
>  Thomas

Hmmm, /etc/defaults/rc.conf should be transparent to postinstall(8), as
it gets installed via  during build.sh.

If the source ('-s' of postinstall(8)) is not specified at command line,
according to code it takes "/usr/src" as default. Do you have a
stale/old rc.conf(5) file in there, /usr/src/etc/defaults/rc.conf?

Now, if this case should be handled properly is another question. There
are multiple solutions, all of them end up tweaking postinstall(8):
- raise a warning when overriding certain configuration files with older
versions (perhaps by using the CVS tags?)
- make source an explicit arg to etcupdate/postinstall
- something else?

-- 
Jean-Yves Migeon
j...@netbsd.org


Re: CVS commit: [cherry-xenmp] src/sys/arch

2011-08-30 Thread Jean-Yves Migeon
On 30.08.2011 14:53, Cherry G. Mathew wrote:
> Module Name:  src
> Committed By: cherry
> Date: Tue Aug 30 12:53:46 UTC 2011
> 
> Modified Files:
>   src/sys/arch/i386/i386 [cherry-xenmp]: machdep.c
>   src/sys/arch/xen/x86 [cherry-xenmp]: cpu.c x86_xpmap.c
> 
> Log Message:
> Add per-cpu mmu queues

Thanks for looking into it!

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/share/man/man4

2011-08-29 Thread Jean-Yves Migeon
On 29.08.2011 22:27, Manuel Bouyer wrote:
> On Mon, Aug 29, 2011 at 03:17:16PM +0100, Jean-Yves Migeon wrote:
>> Both; usually I am using the VGA-emulated display, but sometimes
>> (like I did lately) I switch to console, so I can keep a reasonable
>> amount of history of the dom0 output in my terminal.
> 
> The the dom0 kernel is using xencons, and the ddb sequence is '+'.
> See the cn_set_magic() call in xencons.c

Yes, I can confirm now. I'll update ddb(4) to reflect this.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen

2011-08-29 Thread Jean-Yves Migeon
On 29.08.2011 15:03, Cherry G. Mathew wrote:
> I'm a bit confused now - are we assuming that the pmap lock protects the
> (pte update op queue-push(es) + pmap_pte_flush()) as a single atomic
> operation (ie; no invlpg/tlbflush or out-of-band-read can occur between
> the update(s) and the pmap_pte_flush()) ?
> 
> If so, I think I've slightly misunderstood the scope of the mmu queue
> design - I assumed that the queue is longer-lived, and the sync points
> (for the queue flush) can span across pmap locking - a sort of lazy pte
> update, with the queue being flushed at out-of-band or in-band read
> time ( I guess that won't work though - how does one know when the
> hardware walks the page table ?) . It seems that the queue is meant for
> pte updates in loops, for eg:, quickly followed by a flush. Is this
> correct ? 

IMHO, this should be regarded this way, and nothing else. x86 and xen
pmap(9) share a lot in common, low level operations (like these: PT/PD
editing, TLB flushes, MMU updates...) should not leak through this
abstraction.

Said differently, the way Xen handles MMU must remain transparent to
pmap, except in a few places. Remember, although we are adding a level
of indirection through hypervisor, the calls should remain close to
native x86 semantic.

However, for convenience, Xen offers "multiseats" MMU hypercalls, where
you can schedule more than one op at a time to avoid unneeded context
switches, like in the pmap_alloc_level() function. This is our
problematic part.

> If so, there's just one hazard afaict - the synchronous TLB_FLUSH_MULTI
> could beat the race between the queue update and the queue flush via
> pmap_tlb_shootnow() (see pmap_tlb.c on the cherry-xenmp branch, and *if*
> other CPUs reload their TLBs before the flush, they'll have stale info.

What stale info? If a VCPU queue isn't empty while another VCPU has
scheduled a TLB_FLUSH_MULTI op, the stale content of the queue will
eventually be depleted later after a pmap_pte_flush() followed by a
local invalidation. This is the part that should be carefully reviewed.

For clarity, I would expect a queue to be empty when leaving pmap (e.g.
when releasing its lock). Assertions might be necessary to catch all
corner cases.

> So the important question (rmind@ ?) is, is pmap_tlb_shootnow()
> guaranteed to be always called with the pmap lock held ?
> 
> In real life, I removed the global xpq_queue_lock() and the pmap was
> falling apart. So a bit of debugging ahead.

Did you keep the same queue for all CPUs? If that was the case, yes,
this is a call for trouble.

Anyway, we can't put a giant lock around xpq_queue. This doesn't make
any sense in a MP system, especially for operations that are frequently
called and still may take a while to complete. Just imagine: all CPUs
waiting for one to finish its TLB flush before they can edit their PD/PT
again... ouch.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/share/man/man4

2011-08-29 Thread Jean-Yves Migeon

On Mon, 29 Aug 2011 15:25:03 +0200, Manuel Bouyer wrote:

On Mon, Aug 29, 2011 at 12:45:27PM +0100, Jean-Yves Migeon wrote:

Hmm, I'll have to cross-check that one this afternoon. IIRC, I am
also using the the default "break" command when I am running the
dom0 inside QEMU, and '+' is only used when I want to break in
the domU (through xencons(4)).


But when running inside qemu, you're using qemu's VGA display,
not its serial port, right ?


Both; usually I am using the VGA-emulated display, but sometimes (like 
I did lately) I switch to console, so I can keep a reasonable amount of 
history of the dom0 output in my terminal.



in this case I guess the dom0 is using wscons, not the xencons.
xencons is (should) be used by dom0 only on serial console.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/share/man/man4

2011-08-29 Thread Jean-Yves Migeon

On Mon, 29 Aug 2011 11:46:06 +0200, Manuel Bouyer wrote:

On Mon, Aug 29, 2011 at 09:01:47AM +0200, Jean-Yves Migeon wrote:

What kind of console is attaching for you in dom0? I can't see how
'+' would get wired in by default. At least when either started 
from

bare metal, or QEMU.


'+' is used for serial console (or, to be more precise, xencons
which is what is used by dom0 when the hypervisor is configured to 
use serial
console), where break won't work because the hypervisor doesn't 
forward

it to the dom0.


Hmm, I'll have to cross-check that one this afternoon. IIRC, I am also 
using the the default "break" command when I am running the dom0 inside 
QEMU, and '+' is only used when I want to break in the domU (through 
xencons(4)).


In that case, I'll have to make adjustments. Having the same key 
sequence for both dom0 and domU is... problematic.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/share/man/man4

2011-08-29 Thread Jean-Yves Migeon
On 29.08.2011 06:30, Christoph Egger wrote:
> On 29.08.11 06:26, Christoph Egger wrote:
>> On 29.08.11 00:09, Jean-Yves Migeon wrote:
>>> Module Name:src
>>> Committed By:   jym
>>> Date:   Sun Aug 28 22:09:37 UTC 2011
>>>
>>> Modified Files:
>>> src/share/man/man4: ddb.4
>>>
>>> Log Message:
>>> Be more precise for Xen: + is only valid for Xen domUs. dom0 uses
>>> the same key sequences as i386/amd64.
>>
>> + also works for me in a dom0.
> 
> I think I have to mention I am using serial console regularly.

What kind of console is attaching for you in dom0? I can't see how
'+' would get wired in by default. At least when either started from
bare metal, or QEMU.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/usermode/usermode

2011-08-27 Thread Jean-Yves Migeon
On 27.08.2011 20:28, Joerg Sonnenberger wrote:
> On Sat, Aug 27, 2011 at 08:13:28PM +0200, Jean-Yves Migeon wrote:
>> On 27.08.2011 19:57, Reinoud Zandijk wrote:
>>> Fix copystring routines to NOT just copy all since not all space might be
>>> writable. This can be fixed by implementing/importing strnlen(3) in the 
>>> kernel
>>
>> Any reason no to? If there's none, I can do it.
>>
>> At first sight it's straightforward to add to common/, and I am more at
>> peace knowing that we have a "valid" strnlen() in kernel rather than a
>> bogus macro that may spread elsewhere...
> 
> Or it could use memchr for that same purpose.

Right; note that the question about strnlen(3) remains valid though:
strlen/strnlen appear in the same man page. So one's could expect to
have both accessible, even in kernel.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/usermode/usermode

2011-08-27 Thread Jean-Yves Migeon
On 27.08.2011 19:57, Reinoud Zandijk wrote:
> Fix copystring routines to NOT just copy all since not all space might be
> writable. This can be fixed by implementing/importing strnlen(3) in the kernel

Any reason no to? If there's none, I can do it.

At first sight it's straightforward to add to common/, and I am more at
peace knowing that we have a "valid" strnlen() in kernel rather than a
bogus macro that may spread elsewhere...

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen

2011-08-22 Thread Jean-Yves Migeon

On Mon, 22 Aug 2011 12:47:40 +0200, Manuel Bouyer wrote:
This is slightly more complicated than it appears. Some of the "ops" 
in
a per-cpu queue may have ordering dependencies with other cpu 
queues,
and I think this would be hard to express trivially. (an example 
would
be a pte update on one queue, and reading the same pte read on 
another
queue - these cases are quite analogous (although completely 
unrelated)


read don't go through the xpq queue, don't they ?


Nope, PTE are directly obtained from the recursive mappings 
(vtopte/kvtopte).


Content is "obviously" only writable by hypervisor (so it can keep 
control of his mapping alone).



I think this is similar to a tlb flush but the other way round,
I guess we could use a IPI for this too.


IIRC that's what the current native x86 code does: it uses an IPI to 
signal other processors that a shootdown is necessary.


I'm thinking that it might be easier and more justifiable to nuke 
the
current queue scheme and implement shadow page tables, which would 
fit

more naturally and efficiently with CAS pte updates, etc.


I'm not sure this would completely fis the issue: with shadow page 
tables
you can't use a CAS to assure atomic operation with the hardware TLB, 
as

this is, precisely, a shadow PT and not the one used by hardware.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen

2011-08-21 Thread Jean-Yves Migeon
On 21.08.2011 21:39, Cherry G. Mathew wrote:
> JM> An alternative would be to have per-CPU xpq_queue[] also. This
> JM> is not completely stupid, xpq_queue is meant as a way to put
> JM> multiple MMU operations in a queue asynchronously before issuing
> JM> only one hypercall to handle them all.
> 
> This is slightly more complicated than it appears. Some of the "ops" in
> a per-cpu queue may have ordering dependencies with other cpu queues,
> and I think this would be hard to express trivially. (an example would
> be a pte update on one queue, and reading the same pte read on another
> queue - these cases are quite analogous (although completely unrelated)
> to classic RAW and other ordering dependencies in out-of-order execution
> scenarios due to pipelining, etc.).

Yes; however this should be handled by the pmap layer. xpq_queue is
fairly low level and has no access to all the knowledge/semantic
required to ensure proper PT/PD synchronization.

It is rather a way to handle multiple MMU operations into one hypercall,
like readv/writev can do for multiple read/write(s).

Mapping different pmaps in the same VA space requires to lock them, so
you should not see multiple CPUs concurrently reading/writing shared
entries. One thing remains with the old => new PTE updates that need
CAS, but this is Xen's mission to do that correctly as it manages the
MMU for us. All we have to ensure is that there's no stale operation in
the queue before leaving pmap.

> I'm thinking that it might be easier and more justifiable to nuke the
> current queue scheme and implement shadow page tables, which would fit
> more naturally and efficiently with CAS pte updates, etc.

Beware with those; shadow page tables were mainly designed with Linux in
mind, and the way it manages virtual memory is completely different to
ours: it maps the entire physical memory in the kernel virtual space
(with tricks when there's more than 1GiB involved), while we use
recursive mappings. And Xen has problems validating these.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen

2011-08-21 Thread Jean-Yves Migeon
On 21.08.2011 12:26, Jean-Yves Migeon wrote:
> - second, the lock is not placed at the correct abstraction level IMHO,
> it is way too high in the caller/callee hierarchy. It should remain
> hidden from most consumers of the xpq_queue API, and should only be used
> to protect the xpq_queue array together with its counters (and
> everything that isn't safe for all memory operations happening in xpq).
> 
> Reason behind this is that your lock protects calls to hypervisor MMU
> operations, which are hypercalls (hence, a "slow" operation with regard
> to kernel). You are serializing lots of memory operations, something
> that should not happen from a performance point of view (some may take a
> faire amount of cycles to complete, like TLB flushes). I'd expect all
> Xen MMU hypercalls to be reentrant.

An alternative would be to have per-CPU xpq_queue[] also. This is not
completely stupid, xpq_queue is meant as a way to put multiple MMU
operations in a queue asynchronously before issuing only one hypercall
to handle them all.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen

2011-08-21 Thread Jean-Yves Migeon
On 10.08.2011 11:50, Cherry G. Mathew wrote:
> Module Name:  src
> Committed By: cherry
> Date: Wed Aug 10 09:50:37 UTC 2011
> 
> Modified Files:
>   src/sys/arch/xen/include: xenpmap.h
>   src/sys/arch/xen/x86: x86_xpmap.c
> 
> Log Message:
> Introduce locking primitives for Xen pte operations, and xen helper calls for 
> MP related MMU ops

Hi Cherry,

Sorry for this very late comment, as I was AFK most of the time for the
last three weeks.

I have two concerns with your patch:

- first, you introduce locking primitives based on simple_lock(9). This
mechanism ought to go in the not-so-distant future, so I would replace
all these with a properly chosen mutex, likely at IPL_VM (depending on
the priority of all consumers of the xpq_*() functions). This must be
carefully reviewed first.

- second, the lock is not placed at the correct abstraction level IMHO,
it is way too high in the caller/callee hierarchy. It should remain
hidden from most consumers of the xpq_queue API, and should only be used
to protect the xpq_queue array together with its counters (and
everything that isn't safe for all memory operations happening in xpq).

Reason behind this is that your lock protects calls to hypervisor MMU
operations, which are hypercalls (hence, a "slow" operation with regard
to kernel). You are serializing lots of memory operations, something
that should not happen from a performance point of view (some may take a
faire amount of cycles to complete, like TLB flushes). I'd expect all
Xen MMU hypercalls to be reentrant.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys

2011-07-30 Thread Jean-Yves Migeon
On 30.07.2011 19:01, Christos Zoulas wrote:
> Module Name:  src
> Committed By: christos
> Date: Sat Jul 30 17:01:05 UTC 2011
> 
> Modified Files:
>   src/sys/conf: files
>   src/sys/kern: init_main.c kern_lwp.c kern_synch.c
> Added Files:
>   src/sys/kern: subr_pserialize.c
>   src/sys/sys: pserialize.h
> 
> Log Message:
> Add an implementation of passive serialization as described in expired
> US patent 4809168. This is a reader / writer synchronization mechanism,
> designed for lock-less read operations.

Interesting! What would be the direct consumers of such an
implementation? For example, can pserialize be a drop-in replacement for
everything that is rwlock(9) based, or are they limitations?

(I am aware that passive serialization is not equivalent to RCU).

I remember reading articles mentioning how "wonderful" lockless
algorithms are, except in situation where the additional bus
locking/atomic ops involved did not really improve the situation in
highly concurrent systems (and could even make it worse).

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2011-07-18 Thread Jean-Yves Migeon

On Mon, 18 Jul 2011 07:41:30 +0100, David Laight wrote:

On Mon, Jul 18, 2011 at 02:18:54AM +0200, Jean-Yves Migeon wrote:

On 18.07.2011 02:00, David Young wrote:
>> Can we please use ansi function definitions in newly committed 
code?

>
> This was tedious enough without converting to ANSI function 
definitions.

> A good job for Coccinelle (spatch)?

Sadly, no: last time I tried (when moving kvm(3) code to ANSI 
style), I
had to do it manually. It's even the opposite, spatch has issues 
when
parsing non-ANSI declarations, so you have to do the conversion all 
by

yourself first...


I've a sed script that will change most of them
see ftp.netbsd.org:~dsl/protoz


Ah yes, forgot about it. Works pretty well, thanks for pointing it out.

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2011-07-17 Thread Jean-Yves Migeon
On 18.07.2011 02:00, David Young wrote:
>> Can we please use ansi function definitions in newly committed code?
> 
> This was tedious enough without converting to ANSI function definitions.
> A good job for Coccinelle (spatch)?

Sadly, no: last time I tried (when moving kvm(3) code to ANSI style), I
had to do it manually. It's even the opposite, spatch has issues when
parsing non-ANSI declarations, so you have to do the conversion all by
yourself first...

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64

2011-07-17 Thread Jean-Yves Migeon
On 17.07.2011 23:18, David Laight wrote:
>>> The .byte streams are required for the inclusion of the AES NI
>>> instructions, which are not supported with our current gcc version.
>>>
>>> Should be fixed once we have stabilized gcc 4.5 (dunno about other
>>> compilers though, especially pcc).
>>
>> That doesn't make any sense. This are *assembler* instructions, not GCC
>> intrinsics.

nm, sorry; was thinking about gas, not gcc.

> Also, having looked at the file, even if it is using instructions that
> the assembler can't process, it is a horrid mess.
> There are much better ways to specify instructions than just .byte sequences.
> Even if you aren't using CPP, the assmembler will support local constants
> and expressions.
> Even a few comments would help.

IIRC, this is the code as generated by the Perl scripts in openssl (byte
streams and the resulting ugliness are neither my own nor spz@).
I tend to steer away from manipulating code (particularly crypto) when I
don't have good knowledge of it. And this is far from being the case for
me with OpenSSL.

Anyway, I'll look into it next week for cleanup.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64

2011-07-17 Thread Jean-Yves Migeon
On 17.07.2011 21:48, Joerg Sonnenberger wrote:
> Module Name:  src
> Committed By: joerg
> Date: Sun Jul 17 19:48:31 UTC 2011
> 
> Modified Files:
>   src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64: aes.inc
> 
> Log Message:
> Disable Clang's integrated assembler for the AES-NI files for now.
> Somewhere in this mess of .byte streams, corruption happens. Disassembly
> only shows slightly different filling of alignment sequences, further
> analysis is needed.
> 
> XXX This should be rewritten to be proper assembler code

The .byte streams are required for the inclusion of the AES NI
instructions, which are not supported with our current gcc version.

Should be fixed once we have stabilized gcc 4.5 (dunno about other
compilers though, especially pcc).

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/usr.bin/pmap

2011-06-25 Thread Jean-Yves Migeon
On 24.06.2011 00:50, Christos Zoulas wrote:
> Module Name:  src
> Committed By: christos
> Date: Thu Jun 23 22:50:54 UTC 2011
> 
> Modified Files:
>   src/usr.bin/pmap: main.c
> 
> Log Message:
> Don't give out information about processes we can't control.

Thanks to Aleksey and you for fixing the procfs leak.

I wonder whether pmap's code is the right place to check for
"information" access control. It's difficult to modify except by
patching the source, does not protect from abusing/finding exploits to
circumvent the check (any executable that has kmem sgid rights is a
target), and there are other potential tools usable out there (lsof(1),
maybe?).

Isn't it something that rather fits the kauth(9) ACLs?

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen/xen

2011-05-15 Thread Jean-Yves Migeon
On 15.05.2011 21:11, Mindaugas Rasiukevicius wrote:
> This is not correct.  Atomic op might decrease the reference count to
> 1 while other thread executes xbdi_put() before xbdi_refcnt is fetched,
> thus decreasing it to 0.  In such case, both threads would fetch 0.
> 
> The following is a correct way:
> 
>   if (atomic_dec_uint_nv(&xbdip->xbdi_refcnt) == 0)
>   xbdback_finish_disconnect(xbdip);
> 
> Also, it seems there is no need for xbdi_refcnt to be volatile.

Good point; I was pondering about the _nv() version when I made the
change, but forgot about the possible concurrency regarding refcnt fetch
(not actually possible as port-xen is not MP, but will become soonish)

I'll remove the volatile declaration too, only xbdi_put/_get use the
refcnt for G/C anyway.

Thanks for pointing that out!

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen/xen

2011-04-18 Thread Jean-Yves Migeon
On 18.04.2011 10:35, Mindaugas Rasiukevicius wrote:
> We used to check the return of big size allocations, when kmem(9) could
> fail even with KM_SLEEP due to KVA starvation.  However, pretty much all
> kernel does not perform checks for smaller allocations and since the bug
> was fixed - we are no longer checking for big ones as well.
> 
> This applies to all allocators.

So, any thread sleeping for an allocation cannot be interrupted?

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen/xen

2011-04-18 Thread Jean-Yves Migeon
On 18.04.2011 05:04, Mindaugas Rasiukevicius wrote:
> Module Name:  src
> Committed By: rmind
> Date: Mon Apr 18 03:04:31 UTC 2011
> 
> Modified Files:
>   src/sys/arch/xen/xen: balloon.c
> 
> Log Message:
> balloon_xenbus_attach: use KM_SLEEP for allocation.
> 
> Note: please do not use KM_NOSLEEP.

Ah yes, forgot about this one, thanks.

Although I am still unsure about the check, in-kernel NULL deref is...
problematic.

I am not so sure whether it is safe to assume non-NULL return if caller
can sleep. It's something that ought to be specified for all available
memoryallocators(9) especially as the code behind can evolve (hey, Lars
:) ).

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen/xen

2011-04-18 Thread Jean-Yves Migeon
On 18.04.2011 09:23, Cherry G. Mathew wrote:
> On 18 April 2011 08:00, Cherry G. Mathew  wrote:
>> On 18 April 2011 04:21, Mindaugas Rasiukevicius  wrote:
>>> Masao Uebayashi  wrote:
>>>> On Mon, Apr 18, 2011 at 03:04:32AM +, Mindaugas Rasiukevicius wrote:
>>>>> Module Name:src
>>>>> Committed By:   rmind
>>>>> Date:   Mon Apr 18 03:04:31 UTC 2011
>>>>>
>>>>> Modified Files:
>>>>> src/sys/arch/xen/xen: balloon.c
>>>>>
>>>>> Log Message:
>>>>> balloon_xenbus_attach: use KM_SLEEP for allocation.
>>>>>
>>>>> Note: please do not use KM_NOSLEEP.
>>>>
>>>> And, according to yamt@, KM_SLEEP can fail in the current design...
>>>
>>> IIRC yamt@ fixed it a year or few ago.
>>
>> And in the more specific immediate context:
>> http://mail-index.netbsd.org/port-xen/2011/04/07/msg006613.html
>>
> PS: Can you please revert ? Unless it's breaking anything else, or you
> have a fix for the problem mentioned in the thread above, ie;

Uho, I forgot to mention in my commit log that I fixed it. I am
allocating bpg_entries via pool_cache(9), and the constructor
bpge_ctor() will return an error if uvm(9) fails to find a free page. In
that case, the thread will just bail out and start waiting again.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen/xen

2011-04-06 Thread Jean-Yves Migeon
On 06.04.2011 20:01, Manuel Bouyer wrote:
> You could also use
> xvifxiy (e.g. xvif5i2, where i stands for 'index').
> or any other letter ...

Huh hmm indeed... I wonder why I did not think about this approach before...

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen/xen

2011-04-06 Thread Jean-Yves Migeon
On Wed, 6 Apr 2011 16:29:22 +, Taylor R Campbell 
 wrote:

Date: Mon, 04 Apr 2011 23:46:19 +0200
   From: Jean-Yves Migeon 

   The newer scripts for Xen read the interface value from the 
vifname
   entry in Xenstore, so changing the name is not that critical. But 
this
   should be stabilized sooner than later. Personally, I find '_' 
rather
   ugly in an interface name... but I am open to suggestions, even 
fixing

   rc.d/network if it needs to.

One could declare that ifnames must match [a-z][-a-z0-9]*[0-9], and
then in /etc/rc.d/network pass them through `tr - _' before 
evaluating

`args=\$ifconfig_$int'.  This guarantees that setting ifconfig_ifN in
/etc/rc.conf always works, and depends only on agreeing to the ifname
policy and ensuring that all the existing ifnames actually follow it.

Of course, tr might not be available yet in /etc/rc.d/network, but
someone wizardlier (or more patient) with sh than I can probably work
around that...


Yep, but there are other places where this will get tricky. For 
example, rc.conf(5) is parsed as a classic shell script [1]. Given that 
someone has two possibilities to configure interfaces:


/etc/ifconfig.xxx
or
ifconfig_xxx # in rc.conf(5)

If we follow the same "xxx" naming convention for both, we would have 
to 'tr' certain lines of rc.conf for interfaces that contains a 
[^a-zA-Z0-9_] char. This complexifies rc.conf parsing for no real 
benefit. Putting a ifconfig_xxx-yyy="auto" in rc.conf(5) will likely 
result in a parsing error.


I am about to revert my patch anyway, and use '_' instead. I will also 
update ifconfig.if(5) to clearly indicate that only chars accepted in 
shell variable names should be used.


[1] http://nxr.netbsd.org/xref/src/etc/rc#22

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/xen/xen

2011-04-04 Thread Jean-Yves Migeon
On 04.04.2011 23:21, Taylor R Campbell wrote:
>Date: Sun, 3 Apr 2011 23:21:39 +
>From: "Jean-Yves Migeon" 
> 
>Now that pkgsrc-2011Q1 has arrived, and before -6 chimes in, change
>ifxname for xvif(4) from xvif%d.%d to xvif%d-%d. This is needed
>to avoid sysctl(9) EINVAL errors when creating interface nodes.
> 
> This change came awfully close to fixing PR misc/39376 too...
> 
> (Hyphens may be valid in sysctl nodes while dots are not, but neither
> hyphens nor dots are valid in sh variables, for /etc/rc.d/network's
> ifconfig_ifN.)

Good catch; I missed it...

The newer scripts for Xen read the interface value from the vifname
entry in Xenstore, so changing the name is not that critical. But this
should be stabilized sooner than later. Personally, I find '_' rather
ugly in an interface name... but I am open to suggestions, even fixing
rc.d/network if it needs to.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/i386/conf

2011-02-23 Thread Jean-Yves Migeon
On Wed, 23 Feb 2011 12:17:43 +0900, Masao Uebayashi 
 wrote:

IMHO, for x86, the kernel cannot assume that the bootloader loaded
certain modules, nor can the bootloader expect that kernel XYZ is in 
a

specific configuration. They should be agnostic from one another.


I think that TRT here is that kernel tells bootloader what to load.
This should be possible by allocating a static buffer shared by
bootloader/kernel, and kernel does reboot (== calling bootloader).


There are, at least, two non trivial points to solve here:

- as said, for x86, you are pushing for an interface that does not yet 
exist between kernel and bootloader. I highly doubt that Grub/Grub2, 
syslinux, or any other bootloader not under direct control of TNF, will 
follow such a model.


- you have to "configure", but also to "unconfigure" device upon each 
reboot. You have to teach the interface not only "what to load", but 
also, "what is the state" of a driver module. Module's loading can 
change the state of devices, and rebooting/calling bootloader will _not_ 
reset that state.


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/amd64/conf

2011-02-21 Thread Jean-Yves Migeon
  recompiling half a dozen files actually affected by the 
change.
 > >  This is a non-starter; furthermore, in such an environment 
if I
 > >  screw up with the versions I can easily render my test 
machine

 > >  unbootable.
 >
 > 100% agreed. Regular problem with Xen kernels.

Glad at least some things can be agreed on :-)


Good, so we are making progress :)

 > MONOLITHIC was added in a rush to shut up complaints, because 
people

 > couldn't make a difference between these types:
 > 1 - MONOLITHIC kernel, where options are compiled in (_provided_ 
you

 > enabled them), with module support disabled
 > 2 - MODULAR kernel, with most common options(7) compiled as 
"builtin"

 > modules (essentially making it like 1, except "options MODULAR")
 > 3 - MODULAR kernel, with most options(7) compiled as third party 
modules.


Well right, except that case 2 isn't really a modular kernel. The
existence of "options MODULAR" is not the issue that matters (mostly,
modulo the autoload issues) -- it's case 3 that causes serious
complications.

Even without a fix for the autoloading issues, I don't think anyone
really objects much to having "options MODULAR" turned on. Anyone
who's setting up their own kernel config to disable things and 
doesn't

want the autoloader providing them anyway can turn off "options
MODULAR" at the same time they turn off whatever else they're
concerned about.

(Knowing that they need to do this isn't desirable and the 
autoloading

issues should be resolved, but it's not critical at the same level.)

 > Please stop opposing MONOLITHIC and MODULAR. Given the current
 > architecture, modules can be turned into builtins, which is, in 
essence,
 > the same thing as a monolithic kernel, when you turn on most 
options.


I do not understand what you mean by this.


That if you want a MONOLITHIC kernel, options MODULAR does not oppose 
that. The only exception being that, as you said above, commenting out 
an option won't remove it, but rather move it to a filesystem module.



The premise of your commit was that GENERIC should be modular, that
is, you took out a large number of standard features so they'd be 
only

available via moduldes. Any MONOLITHIC config should have those
features compiled in.


I removed file-systems nobody did not seem to care about when 
externalized as modules (after sending a notice to the relative ML). 
Attempt is made to add modules to i386 GENERIC, and remove some from 
amd64, so they become closer in concept.


If you want me to revert that part, and reflect it in i386 GENERIC 
(essentially turning most modules as builtins), I am fine with that... 
Bu after receiving a buckloads of mails with "no need for accf_* , nor 
CODA" or "please leave file-systems as builtins", well, I ended there 
[1], and asked core@ what I should leave as builtins, and what I can 
turn into modules (and mark them so in config(5), so I could come back 
later if config(5) gets extended -- as you proposed).



There shouldn't be any difference between a compiled-in feature and
the same feature present as a "builtin module".

 > I also stated (also privately with Matthew) that "MONOLITHIC" was 
a bad
 > name. "NOMODULAR" would be more appropriate. If we enable most 
options

 > in GENERIC, MONOLITHIC would just be a "include <...>/GENERIC\n no
 > options MODULAR", with almost zero difference regarding code 
(well,

 > except that you won't have module support anymore).

No, that's not what MONOLITHIC is meant to be. It should turn on the
options that, in GENERIC, are turned off so as to be modules.


So why nobody complained for amd64, which does not provide MONOLITHIC? 
Because amd64 GENERIC turns modules into builtins, making the necessity 
of a MONOLITHIC kernel irrelevant.


 > So, one more time: the current issue is to define what people 
consider
 > as options that should be enabled by default, and what could stay 
out
 > (or as a third party module if you urgently need it back). One 
example:

 > accf_* is a questionable choice, whether its inside GENERIC (as a
 > builtin), or enabled by default in MONOLITHIC.

I don't think that's the issue. Running GENERIC is supposed to make
all stable features and functionality available. The question at hand
is whether those features are compiled in (which would be
"monolithic") or loaded as modules (which would be "modular").


Exactly.

[1] http://en.wikipedia.org/wiki/Voting_paradox

--
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/amd64/conf

2011-02-20 Thread Jean-Yves Migeon
one, which is MODULAR. Now why is that?

MONOLITHIC was added in a rush to shut up complaints, because people
couldn't make a difference between these types:
1 - MONOLITHIC kernel, where options are compiled in (_provided_ you
enabled them), with module support disabled
2 - MODULAR kernel, with most common options(7) compiled as "builtin"
modules (essentially making it like 1, except "options MODULAR")
3 - MODULAR kernel, with most options(7) compiled as third party modules.

Comparatively, the amd64 GENERIC never caused complaints, although it
uses modules: critical options(7), like FFS, EXEC_ELF32/64, COMPAT_32,
TMPFS/MFS were _builtin_.


Please stop opposing MONOLITHIC and MODULAR. Given the current
architecture, modules can be turned into builtins, which is, in essence,
the same thing as a monolithic kernel, when you turn on most options.

I also stated (also privately with Matthew) that "MONOLITHIC" was a bad
name. "NOMODULAR" would be more appropriate. If we enable most options
in GENERIC, MONOLITHIC would just be a "include <...>/GENERIC\n no
options MODULAR", with almost zero difference regarding code (well,
except that you won't have module support anymore).

So, one more time: the current issue is to define what people consider
as options that should be enabled by default, and what could stay out
(or as a third party module if you urgently need it back). One example:
accf_* is a questionable choice, whether its inside GENERIC (as a
builtin), or enabled by default in MONOLITHIC.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/amd64/conf

2011-02-20 Thread Jean-Yves Migeon
On 20.02.2011 15:45, matthew green wrote:
>> Have you measured how much modules supposedly increase the size compared to
>> compiling things directly to the kernel? This seems like a rather silly point
>> to me (without numbers, at least).
> 
> well, i dunno about others but i've found that the old modules
> lying around tends to fill up space pretty quickly, but ignoring
> that problem and looking at recent i386 builds, i see that the
> MONOLITHIC kernel set is only 440kb larger than GENERIC, yet the
> modules set is 3500kb.
> 
> i didn't look into why, but there's some numbers to go from.

(for i386)

When I tried to use GENERIC as a replacement for MONOLITHIC for install
floppies, I took a look at some of the duplication that happens between
"filesys" and "builtin" modules.

All in all, there are approx. ~700kiB of modules in /stand that are also
accountable (e.g. "builtin") inside GENERIC.

Taking the options(4) embedded inside MONOLITHIC and comparing them to
their modules counterpart:

- COMPAT_* => 400kiB as standalone modules
- various fs => 720kiB
- other options (PPP, accf_*, pseudo-devices): 200kiB

=> 1.3MiB. So, a total of 1.3M + 700k = 2MiB. Still missing 1.5MiB.
MODULAR options seems to consume ~70kiB , so I would assume that the
rest is due to PIC mode and ELF headers... ?

That's still 1.5MiB bigger (a complete monolithic kernel vs a full-blown
modular kernel). Needs to be xchecked though.

Note that /stand also includes modules that cannot be included inside a
monolithic kernel, like zfs and solaris compat layer. And these makes up
for ~ half the size of /stand (5MiB).

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/amd64/conf

2011-02-19 Thread Jean-Yves Migeon
On 19.02.2011 17:13, Matthias Drochner wrote:
> 
> jeanyves.mig...@free.fr said:
>> I can't see why MONOLITHIC is needed in the first place
> 
> I think before modular kernels are forced onto the masses,
> at least 2 design problems should be fixed:
> 1. Autoloading needs to be done differently: The kernel doesn't
>have the smarts to know which module is needed in which situation,
>and there is no fine-grained administrative control. I've
>complained in some other mail about the idiotic behavior
>in case of exec format loaders.
> 2. I don't want tons of modules which I'll never need installed
>into my root file system. As it was common in good old times (tm),
>my root filesystems are as small as possible. Now, with modules
>being added to the build, I have to squeeze out stuff after
>each update to keep some percent free space in /.

1. modules can be turned into builtins, and so you will end up in the
same situation as MONOLITHIC, without sacrificing MODULAR. You can
preserve monolithic behavior, but still load modules if you want to.

2. issue is the same as with monolithic-like kernels: instead of having
code provided as third party files, code is directly embedded in. So
what you are gaining as space in one place, you are losing it in
another. Smaller file-system, bigger kernel, or the other way around.
Granted, kernel does not necessarily reside under / .


As said, it's all a matter of choice (which I cannot make). i386 GENERIC
was a PITA as all options(4) were modules, thereby affecting boot at the
smallest occasion. MONOLITHIC was brought back. On the contrary, amd64
GENERIC was monolithic in nature (most modules are builtins), mimicking
i386 MONOLITHIC (but still having modules as possibility), and that did
not seem to bother people.

So either way, it's fine; but please -- i386 and amd64 should share
common grounds. If some want a MONOLITHIC amd64, it's even possible,
although I can't see the point given the arguments above. That would
also save us a kernel build for i386 release.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/amd64/conf

2011-02-19 Thread Jean-Yves Migeon
On 19.02.2011 10:27, Bernd Ernesti wrote:
> On Wed, Feb 16, 2011 at 03:16:58AM +0000, Jean-Yves Migeon wrote:
>> Module Name: src
>> Committed By:jym
>> Date:Wed Feb 16 03:16:58 UTC 2011
>>
>> Modified Files:
>>  src/sys/arch/amd64/conf: GENERIC INSTALL
>>
>> Log Message:
>> Build certain file-systems and options(7) as module(7). 32 and 64 bits
>> support are still builtin, as well as FFS, NFS, TMPFS, and most COMPAT
>> code. Saves approx. 750kiB.
>>
>> Reflect this in INSTALL kernel, where we have to support more file systems
>> statically.
>>
>> See http://mail-index.netbsd.org/port-amd64/2011/02/13/msg001318.html
> 
> Are you going to add a MONOLITHIC kernel to match i386?

No. And honestly, I can't see why MONOLITHIC is needed in the first
place: it was introduced as a quick fix for those that wanted to bluntly
replace their old kernel with a new one, without risking crippling their
system on reboot because ELF and FFS kmods were gone missing.

I dare say that MONOLITHIC was a step in the wrong direction: instead of
cutting down MODULAR options(4) from GENERIC, it would have been better
to include everything as builtin modules, while still offering modular
support.
I can't see the difference of having MONOLITHIC instead of GENERIC with
modules as builtins, except for cases where you want to block module
loading, for security purposes. But there, you are better off removing
most COMPAT code also, which means a new kernel build, anyway.

I perfectly know that the question of "what should be in, and what
should stay as a third party kmod?" is entirely debatable. I notified
core@ about that, and wait for their answer. We could have the bare
minimum builtin inside GENERIC, or provide most options(7) as kmods by
default.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/i386/conf

2011-02-13 Thread Jean-Yves Migeon
On 13.02.2011 17:02, Paul Goyette wrote:
> On Sun, 13 Feb 2011, Jean-Yves Migeon wrote:
> 
>> ...
>> For order of preference, see module(7):
>>
>>The loader will look first for a built-in module with the specified
>>name that has not been disabled (see module_unload() below). If a
>>built-in module with that name is not found, the list of modules
>>prepared by the boot loader is searched.  If the named module is
>>still not found, an attempt is made to locate the module within the
>>file system.
> 
> There should be one additional qualification here.
> 
>  Searching for modules within the file system can only occur after
>  the root file system has been mounted by the initialization code.

Sorry, this part is from module(9). For module(7), there is a mention in
the CAVEATS section:

  If an attempt is made to boot the operating system from a file
  system for which the module is not built into the kernel, the boot
  may fail with the message ``Cannot mount root, error 79''.  On
  certain architectures (currently, i386 and amd64), one may be able
  to recover from this error by using the ``load xxxfs'' command
  before trying to boot.  This command is only available on newer
  bootloaders.


Wording seems a bit redundant, so I condensed this into:

Index: man/man9/module.9
===
RCS file: /cvsroot/src/share/man/man9/module.9,v
retrieving revision 1.26
diff -u -p -u -p -r1.26 module.9
--- man/man9/module.9   9 Jan 2011 05:05:10 -   1.26
+++ man/man9/module.9   13 Feb 2011 16:39:01 -
@@ -175,7 +175,8 @@ If a built-in module with that
 .Fa name
 is not found, the list of modules prepared by the boot loader is searched.
 If the named module is still not found, an attempt is made to locate the
-module within the file system.
+module within the file system, provided it has been mounted by the
+initialization code.
 .Pp
 The
 .Fa flags

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/i386/conf

2011-02-13 Thread Jean-Yves Migeon
On 13.02.2011 14:42, David Laight wrote:
> On Sun, Feb 13, 2011 at 04:37:21AM +0000, Jean-Yves Migeon wrote:
>> Module Name: src Committed By:   jym Date:   Sun Feb 13 
>> 04:37:21 UTC 
>> 2011
>> 
>> Modified Files: src/sys/arch/i386/conf: GENERIC
>> 
>> Log Message: Compile FFS and NFS statically (e.g. not modular) for 
>> GENERIC. These file-systems can be critical for mountroot; as 
>> kernel cannot have access to module(7)s without having / mounted 
>> first... yes, you see the point.
> 
> Does this cause grief with existing installations where the 
> bootloader also loads these as modules?

No; if it does, it should not: the bootloader is free to load whatever
modules it wants to, and this happens also when the user specifies it to
load certain modules manually. It could print errors messages (although,
in my tests, it did not), but should not hamper boot in any way.

IMHO, for x86, the kernel cannot assume that the bootloader loaded
certain modules, nor can the bootloader expect that kernel XYZ is in a
specific configuration. They should be agnostic from one another.
Besides, that would be problematic if it were not the case: given that
boot(8) can auto-load ffs or nfs kmods, the bootloader would be bound
with modular kernels.

For order of preference, see module(7):

The loader will look first for a built-in module with the specified
name that has not been disabled (see module_unload() below). If a
built-in module with that name is not found, the list of modules
prepared by the boot loader is searched.  If the named module is
still not found, an attempt is made to locate the module within the
file system.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/i386/conf

2011-02-11 Thread Jean-Yves Migeon
On 11.02.2011 07:30, Matthias Scheler wrote:
> On Fri, Feb 11, 2011 at 05:06:43AM +0100, Jean-Yves Migeon wrote:
>> Indeed, it would be good to have at least some exec formats and
>> file-systems builtin by default in GENERIC:
>>
>> EXEC_ELF32, _SCRIPT  # obvious
>> FFS, CD9660, MFS, TMPFS, NFS, EXT2FS
> 
> FFS should be compiled into the kernel. But I doubt that somebody uses
> both MFS and TMPFS. I hardly ever use CD9660 or EXT2FS while I use
> NFS a lot. There will however be people with is opposite requirements
> which is why those should really be modules.
> 
>> e.g. the typical file-systems we may have to use, without having to rely
>> on modules that could be absent or non accessible at boot time.
> 
> I don't think it needs typical file-systems. IMHO the kernel should
> contain the file-systems required for booting. This will solve most
> of the update issues.

This will allow booting, but not necessarily in a sufficient environment
to fix update issues: if someone needs to mount an ISO or NFS share to
get updated modules, he will need these.

Besides, it's more or less the list of file-systems that have associated
mount_* in the INSTALL ramdisk; I removed some though: ntfs, kernfs, lfs
and msdos.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/i386/conf

2011-02-10 Thread Jean-Yves Migeon
On 11.02.2011 02:02, Paul Goyette wrote:
> On Fri, 11 Feb 2011, matthew green wrote:
>>> I'm not 100% sure it is worth having FFS and ELF support as modules
>>> in GENERIC.
>>> It may be nice that they CAN be modules, but I suspect 99.99% of systems
>>> will need them.
>>> Anyone who wants them as modules can build a kernel without them.
>>
>> i strongly agree with this.
> 
> Me too.

Indeed, it would be good to have at least some exec formats and
file-systems builtin by default in GENERIC:

EXEC_ELF32, _SCRIPT  # obvious
FFS, CD9660, MFS, TMPFS, NFS, EXT2FS

e.g. the typical file-systems we may have to use, without having to rely
on modules that could be absent or non accessible at boot time.

This has a good chance of making a GENERIC kernel work better out of the
box, without big surprises during hand-made updates where kernel gets
replaced without its associated modules. IIRC, this is why MONOLITHIC
was brought back, due to the number of complaints where GENERIC fails
booting because ffs.kmod or exec_elf32.kmod could not be loaded.

This is what amd64 is approx. doing, but with most file-systems included
by default.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch/i386/conf

2011-02-10 Thread Jean-Yves Migeon
On 10.02.2011 22:23, David Laight wrote:
> On Thu, Feb 10, 2011 at 04:49:19PM +0000, Jean-Yves Migeon wrote:
>> Module Name: src
>> Committed By:jym
>> Date:Thu Feb 10 16:49:19 UTC 2011
>>
>> Modified Files:
>>  src/sys/arch/i386/conf: INSTALL
>>
>> Log Message:
>> For i386, include MONOLITHIC for INSTALL rather than GENERIC. While here,
>> remove drm drivers, we don't need them for install.
>>
>> i386 GENERIC has FFS and ELF support compiled as modules, so we hit
>> an interesting "chicken-egg" situation when the kernel attempts to mount
>> a ffs ramdisk, while the module might be contained inside... the ramdisk.
> 
> I'm not 100% sure it is worth having FFS and ELF support as modules
> in GENERIC.
> It may be nice that they CAN be modules, but I suspect 99.99% of systems
> will need them.
> Anyone who wants them as modules can build a kernel without them.

It's something I mentioned privately with Jared. I think we could come
up with a golden mean for GENERIC kernels: leave most "third party"
drivers/systems as modules, while keeping critical ones included by
default. FFS and ELF come to mind, but there are others too. amd64
GENERIC is close to this.

This would open up the possibility to provide a modular kernel, without
going the "all or nothing" MONOLITHIC way (and avoid many complaints
like "I just replaced my kernel for testing, and it returns an error
when attempting to mount / or exec init").

BTW, I wonder whether modules shouldn't be part of the kern-GENERIC.tgz
set. But this is an orthogonal issue.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/etc/powerd/scripts

2010-12-31 Thread Jean-Yves Migeon
On 31.12.2010 19:51, David Young wrote:
> IMO, we should put the system to sleep by sending a
> power-saving/wakeup-latency goal and a set of waking events (e.g.,
> keystroke, mouse movement, LAN activity) to the root device_t using
> drvctl.  To put any smaller set of devices to sleep, send the goal &
> wake events to some subtree.

I haven't looked at the details of the ioctl(), but I would expect it to
be used against /dev/drvctl, no?

This seems plausible to me; at least, suspend/sleep should be part of
pmf(9) (even in my Xen's case, all driver sleep/wake coded uses pmf(9)).

> FWIW, the sleep states that ACPI names are not sufficient even to
> describe all of the potential sleep states of ACPI hardware.  I have a
> laptop that's perfectly capable of an "S3-like" sleep, but the ACPI BIOS
> doesn't support S3, and the HDD is not formatted properly for the S4
> sleep.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/etc/powerd/scripts

2010-12-31 Thread Jean-Yves Migeon
On 31.12.2010 11:10, Jukka Ruohonen wrote:
> On Fri, Dec 31, 2010 at 11:01:08AM +0100, Jean-Yves Migeon wrote:
>> I am using machdep.sleep_state as node to put a domU into suspend mode.
>> Up to now, putting sleep_state under machdep allowed powerd(8)
>> sleep_button to be used regardless of the environment (eg. ACPI sleep or
>> Xen domU sleep).
> 
> So Xen uses a *machine-dependent* sysctl(8) variable for purposes entirely
> different from the intended one?

machine-dependent: yes
for purposes entirely different from the intended one: not really,
purpose is arguably the same, put system into "sleep" ("sleep" state
being not well defined, I admit).

It's only in my local tree though. This can be done in entirely
different ways, nothing is set in stone.

The side effect of your change is that the sleep_state node will move
under hw.acpi, which is not right in Xen domU case.

>> While retiring sleep_state from machdep goes in the right direction
>> IMHO, will it be replaced by a MI interface to put a system into sleep,
>> as it is not a feature specific to ACPI?
> 
> Definitely agreed. Maybe we could steal zzz(8) from APM?

Seems reasonable to me. We could have a more featureful binary later,
and just alias zzz(8) to it.

> Generally, most of the problems ("the mess") in the area of power management
> have been directly caused by the lack of proper MI interfaces and the overall
> lack of planning. The move towards sysmon_pswitch(9), pmf(9), and powerd(8)
> were a step to the right direction; the mistakes done with the i386-specific
> apm(4) should not be repeated.

Of course not; that's why I am asking.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/etc/powerd/scripts

2010-12-31 Thread Jean-Yves Migeon
On 31.12.2010 10:36, Jukka Ruohonen wrote:
> Module Name:  src
> Committed By: jruoho
> Date: Fri Dec 31 09:36:15 UTC 2010
> 
> Modified Files:
>   src/etc/powerd/scripts: sleep_button
> 
> Log Message:
> Use hw.acpi.sleep.state instead of machdep.sleep_state.

And so it begins :)

I am using machdep.sleep_state as node to put a domU into suspend mode.
Up to now, putting sleep_state under machdep allowed powerd(8)
sleep_button to be used regardless of the environment (eg. ACPI sleep or
Xen domU sleep).

While retiring sleep_state from machdep goes in the right direction
IMHO, will it be replaced by a MI interface to put a system into sleep,
as it is not a feature specific to ACPI?

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/dev/mii

2010-11-27 Thread Jean-Yves Migeon
On 27.11.2010 18:38, Jean-Yves Migeon wrote:
> Module Name:  src
> Committed By: jym
> Date: Sat Nov 27 17:38:49 UTC 2010
> 
> Modified Files:
>   src/sys/dev/mii: miidevs.h
> 
> Log Message:
> Correct string for BCM6709S.

s/BCM6709S/BCM5709S/

-- 
Jean-Yves Migeon
j...@netbsd.org


Re: CVS commit: src/sys/dev/pci

2010-10-10 Thread Jean-Yves Migeon
On 10.10.2010 23:24, Christos Zoulas wrote:
> Module Name:  src
> Committed By: christos
> Date: Sun Oct 10 21:24:35 UTC 2010
> 
> Modified Files:
>   src/sys/dev/pci: agp.c
> 
> Log Message:
> restore binary compatibility for amd64; requested by joerg.

Thanks; I was about to commit the fix.

Second time I got burned on that one :/

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src

2010-10-06 Thread Jean-Yves Migeon
On 06.10.2010 12:16, Manuel Bouyer wrote:
> On Tue, Oct 05, 2010 at 11:48:17PM +0000, Jean-Yves Migeon wrote:
>> [...]
>>
>> XXX Currently, savecore(8) will fail to dump a PAE kernel in a !PAE
>> environment (and reciprocally). So you need to sync and reboot
>> with a kernel of the same mode as the one that crashed. Once the dump
>> is successful, this does not matter anymore.
> 
> Doesn't it work with savecore -N /the_right_kernel ?

No; in fact, dumplo is complete garbage, so savecore fails dumping the
image.

I suspect that the KREAD() in savecore.c does not read at the correct
offset, and just sets garbage for dumplo.

I have to see what code path takes savecore with -N, and where it
differs from the case where it goes through getbootfile(3) when the
booted kernel is different from the one that is about to get dumped.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys

2010-09-27 Thread Jean-Yves Migeon
On 27.09.2010 23:25, Christos Zoulas wrote:
> Module Name:  src
> Committed By: christos
> Date: Mon Sep 27 21:25:39 UTC 2010
>
> Modified Files:
>   src/sys/dev/pci: agp.c
>   src/sys/sys: agpio.h
>
> Log Message:
> backwards compat code for paddr_t being 32 bits.
>
>

Thanks!

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2010-08-17 Thread Jean-Yves Migeon

On Tue, 17 Aug 2010 18:18:22 +0100, David Laight  wrote:
>> Unifying sizes of certain types up to 64-bits would be a good choice
>> from engineering standpoint.  Actually measuring the overhead would
>> answer whether it is worth.
> 
> Why not use a 64bit type that is the union of a 64bit number and two
> 32bit numbers.
> A kernel that uses 32bit paddr_t would use tha approriate (for the
> endianness) 32bit value, having initialised the unused half to zero.
> 
> That would remove almost everything except the structure size overhead

Which is precisely the "problem" here :)

> while maintaining binary compatibility.

I don't think so; modules that manipulates 32 bits bus_addr_t will likely
fail with PAE when you cross the 4GB boundary. I don't see how the union
could solve that.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2010-08-17 Thread Jean-Yves Migeon

On Tue, 17 Aug 2010 14:26:53 +, Andrew Doran  wrote:
> Hi,
> 
> Why are any types other than in the pmap different between PAE and !PAE?
> 
> When this was originally proposed I asked for stuff like paddr_t to be
64
> bits no matter what the kernel compile settings where.
> 
> Thanks.


I need more time!



Let's do this /gradually/. There are already regressions with PAE that
needs fixing before moving further, especially "bugs" that people reported
to me I cannot reproduce with my hosts (a.out breakage with Xen PAE + NX,
and invalid TLB entries that only happens 'randomly' with some AMD 64
Athlon CPUs).

That, and also:
- rmind@ branch work on uvm/pmap. Such a change will interfere with his
work.
- the issue is wider, bus_addr_t will move to "all 64 bits" too
(performance impact?)
- p{d,t}_entry_t vs paddr_t size mix up in pmap, which is bad in the !PAE
case (32 vs 64 bits). Breaking i386 pmap will not be particularly good for
my life expectancy.
- I want to merge xensuspend before entering another bug hunting session
:) One at a time is enough :)
- this is orthogonal to the kvm(3) issue: having 64 bits in-kernel paddr_t
will not automagically solve the "all PAs are unsigned long" assumption of
kvm(3).


-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


re: CVS commit: src/sys/arch

2010-08-17 Thread Jean-Yves Migeon

On Tue, 17 Aug 2010 09:59:54 +1000, matthew green 
wrote:
>> For kernel core dumps, at the present time, no; this needs thinking,
and
>> testing:
>> - kvm(3) API will have to move from using 'unsigned long' to 'paddr_t'.
I
>> am currently checking that for all architectures except i386,
>> sizeof(paddr_t) == sizeof(u_long).
> 
> sparc and 32 bit sparc64 kernels use 64 bit paddr_t.  there are some
> mips ports that do too.  you can't assume the above equality at all,
> not since "paddr_t" was introduced a decade+ ago.

Eeek. I can't see how I can fix that then. Would it be acceptable to say
"the affected functions are private to kvm(3), so don't bother with this
anyway?"

>> - i386 will use 64 bits paddr_t for kvm(3), so we will have to discuss
on
z> how to make it work without too much fuzz with pre-64 bits i386 dumps.
> 
> i think this seems the sanest way.  make userland see the 64 bit
> definition in all cases, and have the libkvm code DTRT.

Thing is, kvm(3) has two functions (kvm_kvatop and kvm_pa2off) that use
PAs internally and assume they are unsigned long. I can't move i386 to 64
bits PA without having to tweak kvm_private.h prototypes, which will affect
all ports in turn.

IMHO, the path of least resistance would be to move to paddr_t.

>> - sparse dumps for memory above 4GB.
>> 
>> Eventually, PAE and !PAE kernels core files should be handled by kvm,
if
>> that's what you are asking.
> 
> right.  i was suggesting that if you simply used nlist(3) to find out
the
> value of "i386_use_pae" then the same code will work for live debugging
> as well as core dumps.
> 
> since core file debugging basically requires this method, and it works
> just as well for live debugging, the sysctl seems like something we
don't
> need to do.

I insist: the sysctl(7) was never meant to be used for checking whether
PAE is enabled for "live" kvm(3) session. I admit, my commit message is not
very clear: the " Will be used by i386 kvm(3) to know the functions that
should [...]" sentence applies to i386_use_pae, not machdep.pae.

It is only there for convenience, eg. when someone wants to know easily if
its system is running under PAE.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


re: CVS commit: src/sys/arch

2010-08-16 Thread Jean-Yves Migeon

On Tue, 17 Aug 2010 06:55:21 +1000, matthew green 
wrote:
>> >> Log Message:
>> >> Add machdep.pae sysctl(7) for i386. Thanks to Paul and Joerg for
their
>> >> reviews.
>> >>
>> >> In kernel, it matches the 'i386_use_pae' variable (0: kernel does
not
>> >> use
>> >> PAE, 1: kernel uses PAE). Will be used by i386 kvm(3) to know the
>> >> functions that should get called for VA => PA translations.
>> > 
>> > does this work for core files as well?
> 
> could you answer this part?  thanks.

For kernel core dumps, at the present time, no; this needs thinking, and
testing:
- kvm(3) API will have to move from using 'unsigned long' to 'paddr_t'. I
am currently checking that for all architectures except i386,
sizeof(paddr_t) == sizeof(u_long).
- i386 will use 64 bits paddr_t for kvm(3), so we will have to discuss on
how to make it work without too much fuzz with pre-64 bits i386 dumps.
- sparse dumps for memory above 4GB.

Eventually, PAE and !PAE kernels core files should be handled by kvm, if
that's what you are asking.

PAE does not affect program core files.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2010-08-16 Thread Jean-Yves Migeon
On 16.08.2010 22:22, matthew green wrote:
>> Module Name: src
>> Committed By:jym
>> Date:Mon Aug 16 19:39:06 UTC 2010
>>
>> Modified Files:
>>  src/sys/arch/i386/i386: machdep.c
>>  src/sys/arch/x86/include: cpu.h
>>
>> Log Message:
>> Add machdep.pae sysctl(7) for i386. Thanks to Paul and Joerg for their
>> reviews.
>>
>> In kernel, it matches the 'i386_use_pae' variable (0: kernel does not use
>> PAE, 1: kernel uses PAE). Will be used by i386 kvm(3) to know the functions
>> that should get called for VA => PA translations.
> 
> does this work for core files as well?
>
> if not, wouldn't this feature be better done use ugly kvm/nlist so
> that it works with the same code on dead kernels?  getting a 0/1
> value via kvm shouldn't really be considered any real problem.

That's the purpose; I don't know of any other way. The sysctl is only
there for practical purposes: cpuctl(8) can indicate that the CPU
supports PAE, but there is no easy way to know if it is active or not
(unless playing with config -x together with the booted kernel, not very
practical...). Hence, the sysctl machdep.

> (why the extern in cpu.h?  i doesn't seem necessary.)

Yes; I wanted to place it at the same level as i386_use_fxsave.
i386_use_pae may get use elsewhere eventually, so I added it to cpu.h.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2010-07-24 Thread Jean-Yves Migeon
On 24.07.2010 20:47, matthew green wrote:
>> On 24.07.2010 09:50, Christoph Egger wrote:
>> Not necessarily, the kvm(3) API manipulates PA as u_long (see
>> _kvm_kvatop in kvm_i386.c). Changing the paddr_t will need a
>> modification of this API too, and basically, all ports will have to move
>> to uint64_t PA (or put casts everywhere...)
> 
> ah, i thought that it already used paddr_t for this.  it should, imo.
> 
> and i'd think that libkvm should be compiled with 64bit paddr_t (ie,
> have the useland definition of paddr_t always be 64 bit) and be smart
> enough to deal with the kernel you feed it.  32 bit sparc has to deal
> with this a lot, since the page tables are considerably more different
> from sparc v8/v9 than i386 vs amd64.

That would be a solution, yes.

FWIW, I am making all PAE related macros and types prefixed with "pae_"
(~ some kind of namespace), and alias the relevant function with them in
_KERNEL when option PAE is enabled.

For kvm(3), both could be used; I could extract a value from a symbol
like "pae_enabled" out of a core file through kvm_nlist, then use the
relevant vatop functions.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2010-07-24 Thread Jean-Yves Migeon
On 24.07.2010 21:00, matthew green wrote:
>> On 24.07.2010 17:54, Matthias Drochner wrote:
>>>
>>> c...@cubidou.net said:
>>>> Kernel modules, on the other hand...
>>>
>>> Hmm, didn't think of that. (using them myself only for testing)
>>>
>>>> a gaping ABI incompatibility is completely unacceptable
>>>
>>> There are two ways to fix this without the 64-bit-paddr overhead,
>>> a short-term and a long-term one.
>>> The short-term fix would be to use another module load path.
>>> This is close to calling it a different port, but it would
>>> not affect userland.
>>
>> "i386pae" and "i386"? Huh... then we will get i386xenpae, i386xen, and
>> so on?
> 
> FWIW, having a different module load path and choosing the right set
> of modules is an incoming feature but i have no time line for it.
> 
> i plan for it to be useful for ppc oea vs. 4xx, sparc32 vs sparc64
> in 32 bitmode, and there was someone else wanting it too.

IMHO, it makes sense for these platforms. But here, i386 and PAE are so
close in concept that it feels like a quirk (at least to me).

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2010-07-24 Thread Jean-Yves Migeon
On 24.07.2010 18:40, Antti Kantee wrote:
> On Sat Jul 24 2010 at 15:23:50 +, Quentin Garnier wrote:
>>> PAE is a "bridge technology", interesting only for few systems.
>>> Users of low-end i386 systems would be penalized, and most users
>>> of boxes with >4G memory would gain nothing because they run
>>> a 64-bit system anyway.
>>> So, imho, no kernel overhead is acceptable.
>>
>> Well, that's one opinion.  My own, probably just as humble, is that such
>> a gaping ABI incompatibility is completely unacceptable, especially
>> after all the work that has been done to make some kernel subsystems a
>> little bit more responsible regarding ABI.
>>
>> I'm really curious to see some actual measurements about that alleged
>> overhead.  The way I see it right now, we have a known lethal ABI
>> incompatibility versus an alleged overhead of unknown size.
> 
> Didn't anyone else read the mail from a week or so ago containing detailed
> measurements of the overhead?

It measures overhead of PAE, not turning paddr_t to 64 bits on !PAE i386.

I don't think that paddr_t moving to 64 bits add much overhead. I would
rather incriminate TLB flushes and 64 bits atomic ops...

> (I'm not 100% sure if I believe the results without further analysis,
> but at least there are benchmarks)

I am not sure that benchmarking is a matter of believing :)

I could make something out of the phoronix tests [1] and track
performance regressions on a revision (or week) basis, atf-style, but as
always, the hardest part about benchmarks is interpretation, not running
them (once everything is in place)

While sysbench announces 20% less memory bandwidth, build.sh runs have
no more than 3% overhead with PAE. So, hmm, between one specific
measurement and real world use... there is quite a gap.

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2010-07-24 Thread Jean-Yves Migeon
On 24.07.2010 17:54, Matthias Drochner wrote:
> 
> c...@cubidou.net said:
>> Kernel modules, on the other hand...
> 
> Hmm, didn't think of that. (using them myself only for testing)
> 
>> a gaping ABI incompatibility is completely unacceptable
> 
> There are two ways to fix this without the 64-bit-paddr overhead,
> a short-term and a long-term one.
> The short-term fix would be to use another module load path.
> This is close to calling it a different port, but it would
> not affect userland.

"i386pae" and "i386"? Huh... then we will get i386xenpae, i386xen, and
so on?

> A more correct but more expensive fix would be to keep out
> paddr_t from the kernel ABI relevant to modules.

Then bus_addr_t and paddr_t should be split.

I do not think that there are many modules that make use of
paddr_t-bound variables; in itself, it should remain within uvm and pmap.

My biggest concern is that there is no real "stable KPI" we could rely
on. rmind@ is reworking the pmap/uvm code in its branch, but told me (in
a private mail) that it may take some time before he merges it. So I
asked if I could commit PAE first and come back to it later.

A long term fix would be to reuse (and probably extend) the modular
world, and have some kind of "one kernel fits all" possibility for x86,
because paddr_t/PD/PT handling also differs with Xen. But making that
work in a safe and consistant manner, without breaking ABI, needs more
thought that I currently could put in my patch.

Cheers,

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2010-07-24 Thread Jean-Yves Migeon
On 24.07.2010 09:50, Christoph Egger wrote:
>> XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
>> cannot be solved easily, and needs lots of thinking before being declared
>> safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
> 
> How about making paddr_t always 64bit? That makes it much easier to deal
> with in libkvm.

Not necessarily, the kvm(3) API manipulates PA as u_long (see
_kvm_kvatop in kvm_i386.c). Changing the paddr_t will need a
modification of this API too, and basically, all ports will have to move
to uint64_t PA (or put casts everywhere...)

> paddr_t being always 64bit wide also allows to switch PAE on and off at
> runtime like Solaris does.

And *off* at runtime?

That's one requirement, however you would have tons of extra work to
make pmap "dynamically" switch from !PAE to PAE, due to paddr_t, pd/pt,
bus_addr_t modifications. That would need a modular pmap, and load that
module very early.

IMHO, "if it ever gets done", we should add Xen to the loop (and fix
that module path issue). So that we would have only one kernel for DOM0,
GENERIC and GENERIC PAE.

Besides, I suspect that turning paddr_t to uint64_t will add overhead,
even if the upper 32 bits are always 0 in GENERIC.

> Pleaes fix the amd64 build error reported on current-us...@.
> The build error is related to rump.

Investigating. rumptest just finished for i386 and amd64, and no error
whatsoever. Guess I'll have to dig further...

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys/arch

2010-07-24 Thread Jean-Yves Migeon
On 24.07.2010 12:30, Manuel Bouyer wrote:
>> How about making paddr_t always 64bit? That makes it much easier to deal
>> with in libkvm.
>
> The overhead would need to be evaluated first.
> Also, I'm not sure this would fix all the libkvm issues (the page table
> format is still different).

For the page table format, the PAE types and macros could be redefined
with a prefix, and then aliased to the ones used in kernel when "options
PAE" is defined.

kvatop/pa2off functions could have access to both macros/typedefs, by
just having pae_ prefixed in front (pae_pl*_pi, pae_pd_entry_t, and so
on). It becomes a matter of calling the proper code within kvm(3), by
checking if PAE was enabled within the kernel dump (through kvm_nlist,
for example).

-- 
Jean-Yves Migeon
jeanyves.mig...@free.fr


Re: CVS commit: src/sys

2010-04-18 Thread Jean-Yves Migeon

On 04/19/10 02:02, matthew green wrote:


XXX Should kernel rev be bumped?

did you change any ABIs?  ie, will any old modules of the same version
now fail to load due to your change, or fail to work properly?


No


it seems unlikely, but i didn't read the big diff.


It fixes a bug that could be labeled as security-sensitive (not 
exploitable directly though). Bumping revision makes it easier to say 
"-current under rev 5.99.x is affected, 5.99.x+1 is not".


--
Jean-Yves Migeon
jeanyves.mig...@free.fr


CVS commit: src/sys/arch/xen/x86

2010-03-09 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Tue Mar  9 23:12:06 UTC 2010

Modified Files:
src/sys/arch/xen/x86: xen_bus_dma.c

Log Message:
Although Xen's documentation states that the address_bits field is not used
by XENMEM_decrease_reservation, it is checked by the hypervisor. In certain
circumstances (stack leak), the field could have an improper value, leading
to a fail of the hypercall.

Set it to 0 ("no addressing restriction") to avoid that.

Patch tested by Sam Fourman and h...@.

This should fix the rare "failed allocating DMA memory" encountered
under NetBSD dom0. Will ask for a pull-up.


To generate a diff of this commit:
cvs rdiff -u -r1.19 -r1.20 src/sys/arch/xen/x86/xen_bus_dma.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/sys/arch/xen/x86/xen_bus_dma.c
diff -u src/sys/arch/xen/x86/xen_bus_dma.c:1.19 src/sys/arch/xen/x86/xen_bus_dma.c:1.20
--- src/sys/arch/xen/x86/xen_bus_dma.c:1.19	Tue Mar  2 00:13:50 2010
+++ src/sys/arch/xen/x86/xen_bus_dma.c	Tue Mar  9 23:12:06 2010
@@ -1,4 +1,4 @@
-/*	$NetBSD: xen_bus_dma.c,v 1.19 2010/03/02 00:13:50 jym Exp $	*/
+/*	$NetBSD: xen_bus_dma.c,v 1.20 2010/03/09 23:12:06 jym Exp $	*/
 /*	NetBSD bus_dma.c,v 1.21 2005/04/16 07:53:35 yamt Exp */
 
 /*-
@@ -32,7 +32,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: xen_bus_dma.c,v 1.19 2010/03/02 00:13:50 jym Exp $");
+__KERNEL_RCSID(0, "$NetBSD: xen_bus_dma.c,v 1.20 2010/03/09 23:12:06 jym Exp $");
 
 #include 
 #include 
@@ -95,6 +95,7 @@
 		xenguest_handle(res.extent_start) = &mfn;
 		res.nr_extents = 1;
 		res.extent_order = 0;
+		res.address_bits = 0;
 		res.domid = DOMID_SELF;
 		error = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &res);
 		if (error != 1) {



CVS commit: src/sys/arch/xen/x86

2010-03-09 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Tue Mar  9 23:12:06 UTC 2010

Modified Files:
src/sys/arch/xen/x86: xen_bus_dma.c

Log Message:
Although Xen's documentation states that the address_bits field is not used
by XENMEM_decrease_reservation, it is checked by the hypervisor. In certain
circumstances (stack leak), the field could have an improper value, leading
to a fail of the hypercall.

Set it to 0 ("no addressing restriction") to avoid that.

Patch tested by Sam Fourman and h...@.

This should fix the rare "failed allocating DMA memory" encountered
under NetBSD dom0. Will ask for a pull-up.


To generate a diff of this commit:
cvs rdiff -u -r1.19 -r1.20 src/sys/arch/xen/x86/xen_bus_dma.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.



CVS commit: src/sys/arch/xen/x86

2010-03-02 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Wed Mar  3 00:09:03 UTC 2010

Modified Files:
src/sys/arch/xen/x86: cpu.c

Log Message:
Use roundup2() instead of hardcoding the CACHE_LINE_SIZE rounding
operation.


To generate a diff of this commit:
cvs rdiff -u -r1.41 -r1.42 src/sys/arch/xen/x86/cpu.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/sys/arch/xen/x86/cpu.c
diff -u src/sys/arch/xen/x86/cpu.c:1.41 src/sys/arch/xen/x86/cpu.c:1.42
--- src/sys/arch/xen/x86/cpu.c:1.41	Wed Feb 24 22:37:55 2010
+++ src/sys/arch/xen/x86/cpu.c	Wed Mar  3 00:09:03 2010
@@ -1,4 +1,4 @@
-/*	$NetBSD: cpu.c,v 1.41 2010/02/24 22:37:55 dyoung Exp $	*/
+/*	$NetBSD: cpu.c,v 1.42 2010/03/03 00:09:03 jym Exp $	*/
 /* NetBSD: cpu.c,v 1.18 2004/02/20 17:35:01 yamt Exp  */
 
 /*-
@@ -66,7 +66,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: cpu.c,v 1.41 2010/02/24 22:37:55 dyoung Exp $");
+__KERNEL_RCSID(0, "$NetBSD: cpu.c,v 1.42 2010/03/03 00:09:03 jym Exp $");
 
 #include "opt_ddb.h"
 #include "opt_multiprocessor.h"
@@ -236,8 +236,7 @@
 		aprint_naive(": Application Processor\n");
 		ptr = (uintptr_t)kmem_zalloc(sizeof(*ci) + CACHE_LINE_SIZE - 1,
 		KM_SLEEP);
-		ci = (struct cpu_info *)((ptr + CACHE_LINE_SIZE - 1) &
-		~(CACHE_LINE_SIZE - 1));
+		ci = (struct cpu_info *)roundup2(ptr, CACHE_LINE_SIZE);
 		ci->ci_curldt = -1;
 	} else {
 		aprint_naive(": %s Processor\n",
@@ -372,8 +371,7 @@
 		aprint_naive(": Application Processor\n");
 		ptr = (uintptr_t)kmem_alloc(sizeof(*ci) + CACHE_LINE_SIZE - 1,
 		KM_SLEEP);
-		ci = (struct cpu_info *)((ptr + CACHE_LINE_SIZE - 1) &
-		~(CACHE_LINE_SIZE - 1));
+		ci = (struct cpu_info *)roundup2(ptr, CACHE_LINE_SIZE);
 		memset(ci, 0, sizeof(*ci));
 #ifdef TRAPLOG
 		ci->ci_tlog_base = kmem_zalloc(sizeof(struct tlog), KM_SLEEP);



CVS commit: src/sys/arch/xen/x86

2010-03-02 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Wed Mar  3 00:09:03 UTC 2010

Modified Files:
src/sys/arch/xen/x86: cpu.c

Log Message:
Use roundup2() instead of hardcoding the CACHE_LINE_SIZE rounding
operation.


To generate a diff of this commit:
cvs rdiff -u -r1.41 -r1.42 src/sys/arch/xen/x86/cpu.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.



CVS commit: src/sys/arch/xen/x86

2010-03-01 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Tue Mar  2 00:13:50 UTC 2010

Modified Files:
src/sys/arch/xen/x86: xen_bus_dma.c

Log Message:
Catch the return value from the XENMEM_decrease_reservation hypercall,
and not some error value stored earlier.

While here, fix a typo in a comment.


To generate a diff of this commit:
cvs rdiff -u -r1.18 -r1.19 src/sys/arch/xen/x86/xen_bus_dma.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.



CVS commit: src/sys/arch/xen/x86

2010-03-01 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Tue Mar  2 00:13:50 UTC 2010

Modified Files:
src/sys/arch/xen/x86: xen_bus_dma.c

Log Message:
Catch the return value from the XENMEM_decrease_reservation hypercall,
and not some error value stored earlier.

While here, fix a typo in a comment.


To generate a diff of this commit:
cvs rdiff -u -r1.18 -r1.19 src/sys/arch/xen/x86/xen_bus_dma.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/sys/arch/xen/x86/xen_bus_dma.c
diff -u src/sys/arch/xen/x86/xen_bus_dma.c:1.18 src/sys/arch/xen/x86/xen_bus_dma.c:1.19
--- src/sys/arch/xen/x86/xen_bus_dma.c:1.18	Sat Feb 27 09:22:40 2010
+++ src/sys/arch/xen/x86/xen_bus_dma.c	Tue Mar  2 00:13:50 2010
@@ -1,4 +1,4 @@
-/*	$NetBSD: xen_bus_dma.c,v 1.18 2010/02/27 09:22:40 jym Exp $	*/
+/*	$NetBSD: xen_bus_dma.c,v 1.19 2010/03/02 00:13:50 jym Exp $	*/
 /*	NetBSD bus_dma.c,v 1.21 2005/04/16 07:53:35 yamt Exp */
 
 /*-
@@ -32,7 +32,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: xen_bus_dma.c,v 1.18 2010/02/27 09:22:40 jym Exp $");
+__KERNEL_RCSID(0, "$NetBSD: xen_bus_dma.c,v 1.19 2010/03/02 00:13:50 jym Exp $");
 
 #include 
 #include 
@@ -81,7 +81,7 @@
 	npagesreq = (size >> PAGE_SHIFT);
 	KASSERT(npages >= npagesreq);
 
-	/* get npages from UWM, and give them back to the hypervisor */
+	/* get npages from UVM, and give them back to the hypervisor */
 	error = uvm_pglistalloc(((psize_t)npages) << PAGE_SHIFT,
 0, avail_end, 0, 0, mlistp, npages, (flags & BUS_DMA_NOWAIT) == 0);
 	if (error)
@@ -96,8 +96,8 @@
 		res.nr_extents = 1;
 		res.extent_order = 0;
 		res.domid = DOMID_SELF;
-		if (HYPERVISOR_memory_op(XENMEM_decrease_reservation, &res)
-		!= 1) {
+		error = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &res);
+		if (error != 1) {
 #ifdef DEBUG
 			printf("xen_alloc_contig: XENMEM_decrease_reservation "
 			"failed: err %d (pa %#" PRIxPADDR " mfn %#lx)\n",



CVS commit: src/sys/arch

2010-02-28 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Mon Mar  1 01:35:11 UTC 2010

Modified Files:
src/sys/arch/amd64/amd64: machdep.c
src/sys/arch/i386/i386: machdep.c

Log Message:
Do not forget that ptoa() casts the result to vaddr_t, which is bad
for paddr_t values under i386 PAE. Use ctob() instead.

Although amd64 is not affected by this vaddr_t vs paddr_t issue (both
having the same size), for the sake of completeness, switch to
ctob() when manipulating paddr_t/psize_t entities in amd64 machdep.c.

Compile tested for i386 and amd64. No regression expected.


To generate a diff of this commit:
cvs rdiff -u -r1.142 -r1.143 src/sys/arch/amd64/amd64/machdep.c
cvs rdiff -u -r1.683 -r1.684 src/sys/arch/i386/i386/machdep.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/sys/arch/amd64/amd64/machdep.c
diff -u src/sys/arch/amd64/amd64/machdep.c:1.142 src/sys/arch/amd64/amd64/machdep.c:1.143
--- src/sys/arch/amd64/amd64/machdep.c:1.142	Mon Feb  8 19:02:26 2010
+++ src/sys/arch/amd64/amd64/machdep.c	Mon Mar  1 01:35:11 2010
@@ -1,4 +1,4 @@
-/*	$NetBSD: machdep.c,v 1.142 2010/02/08 19:02:26 joerg Exp $	*/
+/*	$NetBSD: machdep.c,v 1.143 2010/03/01 01:35:11 jym Exp $	*/
 
 /*-
  * Copyright (c) 1996, 1997, 1998, 2000, 2006, 2007, 2008
@@ -107,7 +107,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.142 2010/02/08 19:02:26 joerg Exp $");
+__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.143 2010/03/01 01:35:11 jym Exp $");
 
 /* #define XENDEBUG_LOW  */
 
@@ -906,7 +906,7 @@
 	if ((error = cpu_dump()) != 0)
 		goto err;
 
-	totalbytesleft = ptoa(cpu_dump_mempagecnt());
+	totalbytesleft = ctob(cpu_dump_mempagecnt());
 	blkno = dumplo + cpu_dumpsize();
 	dump = bdev->d_dump;
 	error = 0;
@@ -1151,7 +1151,7 @@
 
 	for (x = 0; x < vm_nphysseg; x++) {
 		vps = &vm_physmem[x];
-		if (ptoa(vps->avail_end) == avail_end)
+		if (ctob(vps->avail_end) == avail_end)
 			break;
 	}
 	if (x == vm_nphysseg)
@@ -1159,12 +1159,12 @@
 
 	/* Shrink so it'll fit in the last segment. */
 	if ((vps->avail_end - vps->avail_start) < atop(sz))
-		sz = ptoa(vps->avail_end - vps->avail_start);
+		sz = ctob(vps->avail_end - vps->avail_start);
 
 	vps->avail_end -= atop(sz);
 	vps->end -= atop(sz);
 msgbuf_p_seg[msgbuf_p_cnt].sz = sz;
-msgbuf_p_seg[msgbuf_p_cnt++].paddr = ptoa(vps->avail_end);
+msgbuf_p_seg[msgbuf_p_cnt++].paddr = ctob(vps->avail_end);
 
 	/* Remove the last segment if it now has no pages. */
 	if (vps->start == vps->end) {
@@ -1176,7 +1176,7 @@
 	for (avail_end = 0, x = 0; x < vm_nphysseg; x++)
 		if (vm_physmem[x].avail_end > avail_end)
 			avail_end = vm_physmem[x].avail_end;
-	avail_end = ptoa(avail_end);
+	avail_end = ctob(avail_end);
 
 	if (sz == reqsz)
 		return;
@@ -1293,7 +1293,7 @@
 
 	/* Determine physical address space */
 	avail_start = first_avail;
-	avail_end = ptoa(xen_start_info.nr_pages);
+	avail_end = ctob(xen_start_info.nr_pages);
 	pmap_pa_start = (KERNTEXTOFF - KERNBASE);
 	pmap_pa_end = avail_end;
 	__PRINTK(("pmap_pa_start 0x%lx avail_start 0x%lx avail_end 0x%lx\n",

Index: src/sys/arch/i386/i386/machdep.c
diff -u src/sys/arch/i386/i386/machdep.c:1.683 src/sys/arch/i386/i386/machdep.c:1.684
--- src/sys/arch/i386/i386/machdep.c:1.683	Mon Mar  1 01:15:23 2010
+++ src/sys/arch/i386/i386/machdep.c	Mon Mar  1 01:35:11 2010
@@ -1,4 +1,4 @@
-/*	$NetBSD: machdep.c,v 1.683 2010/03/01 01:15:23 jym Exp $	*/
+/*	$NetBSD: machdep.c,v 1.684 2010/03/01 01:35:11 jym Exp $	*/
 
 /*-
  * Copyright (c) 1996, 1997, 1998, 2000, 2004, 2006, 2008, 2009
@@ -67,7 +67,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.683 2010/03/01 01:15:23 jym Exp $");
+__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.684 2010/03/01 01:35:11 jym Exp $");
 
 #include "opt_beep.h"
 #include "opt_compat_ibcs2.h"
@@ -1188,7 +1188,7 @@
 	vps = NULL;
 	for (x = 0; x < vm_nphysseg; ++x) {
 		vps = &vm_physmem[x];
-		if (ptoa(vps->avail_end) == avail_end) {
+		if (ctob(vps->avail_end) == avail_end) {
 			break;
 		}
 	}
@@ -1197,12 +1197,12 @@
 
 	/* Shrink so it'll fit in the last segment. */
 	if (vps->avail_end - vps->avail_start < atop(sz))
-		sz = ptoa(vps->avail_end - vps->avail_start);
+		sz = ctob(vps->avail_end - vps->avail_start);
 
 	vps->avail_end -= atop(sz);
 	vps->end -= atop(sz);
 	msgbuf_p_seg[msgbuf_p_cnt].sz = sz;
-	msgbuf_p_seg[msgbuf_p_cnt++].paddr = ptoa(vps->avail_end);
+	msgbuf_p_seg[msgbuf_p_cnt++].paddr = ctob(vps->avail_end);
 
 	/* Remove the last segment if it now has no pages. */
 	if (vps->start == vps->end) {
@@ -1214,7 +1214,7 @@
 	for (avail_end = 0, x = 0; x < vm_nphysseg; x++)
 		if (vm_physmem[x].avail_end > avail_end)
 			avail_end = vm_physmem[x].avail_end;
-	avail_end = ptoa(avail_end);
+	avail_end = ctob(avail_end);
 
 	if (sz == reqsz)
 		return;
@@ -1386,7 +1386,7 @@
 	/* Make sure the end of the space used by the kernel is rounded. */
 	first_av

CVS commit: src/sys/arch

2010-02-28 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Mon Mar  1 01:35:11 UTC 2010

Modified Files:
src/sys/arch/amd64/amd64: machdep.c
src/sys/arch/i386/i386: machdep.c

Log Message:
Do not forget that ptoa() casts the result to vaddr_t, which is bad
for paddr_t values under i386 PAE. Use ctob() instead.

Although amd64 is not affected by this vaddr_t vs paddr_t issue (both
having the same size), for the sake of completeness, switch to
ctob() when manipulating paddr_t/psize_t entities in amd64 machdep.c.

Compile tested for i386 and amd64. No regression expected.


To generate a diff of this commit:
cvs rdiff -u -r1.142 -r1.143 src/sys/arch/amd64/amd64/machdep.c
cvs rdiff -u -r1.683 -r1.684 src/sys/arch/i386/i386/machdep.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.



CVS commit: src/sys/arch/i386

2010-02-28 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Mon Mar  1 01:15:24 UTC 2010

Modified Files:
src/sys/arch/i386/i386: machdep.c rbus_machdep.c
src/sys/arch/i386/include: rbus_machdep.h

Log Message:
Change rbus_min_start_hint() semantic for i386. "ram" is now psize_t
(instead of size_t) to avoid possible overflow when system may have more
than 4GB of memory (like PAE).

The behavior of rbus_min_start_hint() remains the same. While here, fix
printf's format strings (paddr_t => PRIxPADDR).

Use ctob() and cast physmem to psize_t to avoid losing bits above 4GB.

Comes from PAE patch from Jeremy Morse; adaptation by me.

Compile tested for GENERIC only. No regression expected.


To generate a diff of this commit:
cvs rdiff -u -r1.682 -r1.683 src/sys/arch/i386/i386/machdep.c
cvs rdiff -u -r1.24 -r1.25 src/sys/arch/i386/i386/rbus_machdep.c
cvs rdiff -u -r1.8 -r1.9 src/sys/arch/i386/include/rbus_machdep.h

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.



CVS commit: src/sys/arch/i386

2010-02-28 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Mon Mar  1 01:15:24 UTC 2010

Modified Files:
src/sys/arch/i386/i386: machdep.c rbus_machdep.c
src/sys/arch/i386/include: rbus_machdep.h

Log Message:
Change rbus_min_start_hint() semantic for i386. "ram" is now psize_t
(instead of size_t) to avoid possible overflow when system may have more
than 4GB of memory (like PAE).

The behavior of rbus_min_start_hint() remains the same. While here, fix
printf's format strings (paddr_t => PRIxPADDR).

Use ctob() and cast physmem to psize_t to avoid losing bits above 4GB.

Comes from PAE patch from Jeremy Morse; adaptation by me.

Compile tested for GENERIC only. No regression expected.


To generate a diff of this commit:
cvs rdiff -u -r1.682 -r1.683 src/sys/arch/i386/i386/machdep.c
cvs rdiff -u -r1.24 -r1.25 src/sys/arch/i386/i386/rbus_machdep.c
cvs rdiff -u -r1.8 -r1.9 src/sys/arch/i386/include/rbus_machdep.h

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/sys/arch/i386/i386/machdep.c
diff -u src/sys/arch/i386/i386/machdep.c:1.682 src/sys/arch/i386/i386/machdep.c:1.683
--- src/sys/arch/i386/i386/machdep.c:1.682	Mon Feb  8 19:02:29 2010
+++ src/sys/arch/i386/i386/machdep.c	Mon Mar  1 01:15:23 2010
@@ -1,4 +1,4 @@
-/*	$NetBSD: machdep.c,v 1.682 2010/02/08 19:02:29 joerg Exp $	*/
+/*	$NetBSD: machdep.c,v 1.683 2010/03/01 01:15:23 jym Exp $	*/
 
 /*-
  * Copyright (c) 1996, 1997, 1998, 2000, 2004, 2006, 2008, 2009
@@ -67,7 +67,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.682 2010/02/08 19:02:29 joerg Exp $");
+__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.683 2010/03/01 01:15:23 jym Exp $");
 
 #include "opt_beep.h"
 #include "opt_compat_ibcs2.h"
@@ -481,7 +481,7 @@
 
 #if NCARDBUS > 0
 	/* Tell RBUS how much RAM we have, so it can use heuristics. */
-	rbus_min_start_hint(ptoa(physmem));
+	rbus_min_start_hint(ctob((psize_t)physmem));
 #endif
 
 	minaddr = 0;

Index: src/sys/arch/i386/i386/rbus_machdep.c
diff -u src/sys/arch/i386/i386/rbus_machdep.c:1.24 src/sys/arch/i386/i386/rbus_machdep.c:1.25
--- src/sys/arch/i386/i386/rbus_machdep.c:1.24	Tue Dec 15 22:17:12 2009
+++ src/sys/arch/i386/i386/rbus_machdep.c	Mon Mar  1 01:15:24 2010
@@ -1,4 +1,4 @@
-/*	$NetBSD: rbus_machdep.c,v 1.24 2009/12/15 22:17:12 snj Exp $	*/
+/*	$NetBSD: rbus_machdep.c,v 1.25 2010/03/01 01:15:24 jym Exp $	*/
 
 /*
  * Copyright (c) 1999
@@ -26,7 +26,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD: rbus_machdep.c,v 1.24 2009/12/15 22:17:12 snj Exp $");
+__KERNEL_RCSID(0, "$NetBSD: rbus_machdep.c,v 1.25 2010/03/01 01:15:24 jym Exp $");
 
 #include "opt_pcibios.h"
 #include "opt_pcifixup.h"
@@ -82,10 +82,10 @@
  * or 2046 vs 2048.
  */
 void
-rbus_min_start_hint(size_t ram)
+rbus_min_start_hint(psize_t ram)
 {
 #ifdef RBUS_MIN_START_FORCED
-	aprint_debug("rbus: rbus_min_start from config at 0x%0lx\n",
+	aprint_debug("rbus: rbus_min_start from config at %#0" PRIxPADDR "\n",
 	rbus_min_start);
 #else
 if (ram <= 192*1024*1024UL) {
@@ -113,7 +113,7 @@
 		rbus_min_start =  3 * 1024 * 1024 * 1024UL;
 	}
 
-	aprint_debug("rbus: rbus_min_start set to 0x%0lx\n",
+	aprint_debug("rbus: rbus_min_start set to %#0" PRIxPADDR "\n",
 	   rbus_min_start);
 #endif
 }

Index: src/sys/arch/i386/include/rbus_machdep.h
diff -u src/sys/arch/i386/include/rbus_machdep.h:1.8 src/sys/arch/i386/include/rbus_machdep.h:1.9
--- src/sys/arch/i386/include/rbus_machdep.h:1.8	Tue Dec 15 22:17:12 2009
+++ src/sys/arch/i386/include/rbus_machdep.h	Mon Mar  1 01:15:24 2010
@@ -1,4 +1,4 @@
-/*	$NetBSD: rbus_machdep.h,v 1.8 2009/12/15 22:17:12 snj Exp $	*/
+/*	$NetBSD: rbus_machdep.h,v 1.9 2010/03/01 01:15:24 jym Exp $	*/
 
 /*
  * Copyright (c) 1999
@@ -40,6 +40,6 @@
 rbus_tag_t rbus_pccbb_parent_io(struct pci_attach_args *);
 rbus_tag_t rbus_pccbb_parent_mem(struct pci_attach_args *);
 
-void rbus_min_start_hint(size_t);
+void rbus_min_start_hint(psize_t);
 
 #endif /* _ARCH_I386_I386_RBUS_MACHDEP_H_ */



CVS commit: src/sys/arch/i386/include

2010-02-28 Thread Jean-Yves Migeon
Module Name:src
Committed By:   jym
Date:   Mon Mar  1 00:55:33 UTC 2010

Modified Files:
src/sys/arch/i386/include: pmap.h

Log Message:
Use PDP_SIZE for NTOPLEVEL_PDES (number of top level PDEs) instead of
#ifdef'ing PAE.


To generate a diff of this commit:
cvs rdiff -u -r1.104 -r1.105 src/sys/arch/i386/include/pmap.h

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

Modified files:

Index: src/sys/arch/i386/include/pmap.h
diff -u src/sys/arch/i386/include/pmap.h:1.104 src/sys/arch/i386/include/pmap.h:1.105
--- src/sys/arch/i386/include/pmap.h:1.104	Tue Feb  9 22:51:13 2010
+++ src/sys/arch/i386/include/pmap.h	Mon Mar  1 00:55:33 2010
@@ -1,4 +1,4 @@
-/*	$NetBSD: pmap.h,v 1.104 2010/02/09 22:51:13 jym Exp $	*/
+/*	$NetBSD: pmap.h,v 1.105 2010/03/01 00:55:33 jym Exp $	*/
 
 /*
  *
@@ -279,11 +279,7 @@
 #define NKL2_START_ENTRIES	0	/* XXX computed on runtime */
 #define NKL1_START_ENTRIES	0	/* XXX unused */
 
-#ifdef PAE
-#define NTOPLEVEL_PDES		(PAGE_SIZE * 4 / (sizeof (pd_entry_t)))
-#else
-#define NTOPLEVEL_PDES		(PAGE_SIZE / (sizeof (pd_entry_t)))
-#endif
+#define NTOPLEVEL_PDES		(PAGE_SIZE * PDP_SIZE / (sizeof (pd_entry_t)))
 
 #define NPDPG			(PAGE_SIZE / sizeof (pd_entry_t))
 



  1   2   >