Re: CVS commit: src/sys/arch/x86/x86
Le 09/10/2015 06:49, Masanobu SAITOH a écrit : > On 2015/10/06 6:10, Jean-Yves Migeon wrote: >>> Log Message: >>> kmem_free() the address returned by kmem_alloc(). found by Brainy. >>> use the newly aligned location if we needed it. found by kre. >>> >>> >>> To generate a diff of this commit: >>> cvs rdiff -u -r1.8 -r1.9 src/sys/arch/x86/x86/cpu_ucode_intel.c >> >> IMHO this should be pulled-up to -6 and -7. >> >> Any argument against? If the old code worked, it's pure luck. > > netbsd-6 doesn't support the microcode update function for Intel > CPU. That bug should be pulled up to netbsd-7 (and netbsd-7-1) branch. Makes sense. I'll check and ask for this pullup. -- Jean-Yves Migeon
Re: CVS commit: src/sys/arch/x86/x86
Le 04/10/2015 19:52, matthew green a écrit : > Module Name: src > Committed By: mrg > Date: Sun Oct 4 17:52:50 UTC 2015 > > Modified Files: > src/sys/arch/x86/x86: cpu_ucode_intel.c > > Log Message: > kmem_free() the address returned by kmem_alloc(). found by Brainy. > use the newly aligned location if we needed it. found by kre. > > > To generate a diff of this commit: > cvs rdiff -u -r1.8 -r1.9 src/sys/arch/x86/x86/cpu_ucode_intel.c IMHO this should be pulled-up to -6 and -7. Any argument against? If the old code worked, it's pure luck. -- Jean-Yves Migeon
Re: CVS commit: src/sys/arch/xen
On 24.06.2012 23:44, Christoph Egger wrote: > On 24.06.12 20:31, Jean-Yves Migeon wrote: > >> Module Name:src >> Committed By:jym >> Date:Sun Jun 24 18:31:53 UTC 2012 >> >> Modified Files: >> src/sys/arch/xen/include: xenpmap.h >> src/sys/arch/xen/x86: xen_pmap.c >> >> Log Message: >> Enable the map/unmap recursive mapping functions for all Xen ports for >> save/restore. >> >> For an unknown reason (to me) Xen refuses to update VM translations >> when the entry is pointing back to itself (which is precisely >> what our recursive VM model does). So enable the functions that take >> care of this, which will avoid all sort of memory corruption upon restore >> leading domU to trample upon itself. > > Please report this Xen behaviour to xen-devel. It has always been a tricky step anyway; it already happened with PAE. I am not even sure that Xen is at fault. IIRC the translations fixup step is done by the low level tools upon restore, not the hypervisor directly. IMHO correcting the fixup step will never be possible nor perfect anyway; save operation is like a snapshot: it loses all the history of how the mappings were created. It can get accepted by hypervisor when you validate it in a given order, and not when done it is done in another. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/x86/x86
Le 16/05/12 10:45, Christoph Egger a écrit : On 05/13/12 13:24, Martin Husemann wrote: On Sun, May 13, 2012 at 01:04:15PM +0200, Jean-Yves Migeon wrote: Are you sure that moving to low priority xcalls is the way to go? You can end up with CPUs not being updated because they are offline. Curiously, while I could reproduce the crash before this commit, I am unable to reproduce it in any testing without the actual ucode update happening - and I can not spot a bug in the xcall code that tries to make sure the number of cpus that did run the callback is == the expected count before returning. This clearly needs full analyzis. I am pleased to revert this change once this xcall(9) issue has been fixed. Sure, however I can't see where the xcall(9) code goes wrong. Care to give more details, please? I cannot reproduce it on my side. I am using xcall(9) to flush CPU-bound pool caches and having this sort of bug can definitely cause serious cache incoherency that are hard to track down afterwards. Is it specific to high priority xcalls? -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/x86/x86
Le 13/05/12 13:24, Martin Husemann a écrit : On Sun, May 13, 2012 at 01:04:15PM +0200, Jean-Yves Migeon wrote: Are you sure that moving to low priority xcalls is the way to go? You can end up with CPUs not being updated because they are offline. Curiously, while I could reproduce the crash before this commit, I am unable to reproduce it in any testing without the actual ucode update happening - and I can not spot a bug in the xcall code that tries to make sure the number of cpus that did run the callback is == the expected count before returning. This clearly needs full analyzis. Any quick pointer on the crash in question so I can take a look? I probably missed it on port-i386/amd64. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/x86/x86
Le 10/05/12 14:35, Christoph Egger a écrit : + /* Check for race: +* When we booted with -x a cpu might already finished +* (why doesn't xc_wait() wait for *all* cpus?) +* and sc->sc_blob is already freed. +* In this case the calculation below touches +* non-allocated memory and results in a crash. +*/ xc_wait should wait for all CPUs, except low priority xcalls and offline CPUs. Are you sure that moving to low priority xcalls is the way to go? You can end up with CPUs not being updated because they are offline. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/x86/include
Le 05/05/12 17:08, Jean-Yves Migeon a écrit : Module Name:src Committed By: jym Date: Sat May 5 15:08:29 UTC 2012 Modified Files: src/sys/arch/x86/include: specialreg.h Log Message: Add latest CR4 bits: [snip] From Intel® 64 and IA-32 Architectures Software Developer’s Manual, March 2012. Align declarations. CPUID_* bits for these features follow. Sorry, I was not very clear: the CPUID feature flags follow these declarations, ie. they are already present in headers (and can be used to check the feature's availability). CR4 values can be used to enable them when needed. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/tests/lib/libc/gen
On Mon, 23 Apr 2012 19:02:04 +0100, David Laight wrote: On Mon, Apr 23, 2012 at 10:09:09AM +0200, Jean-Yves Migeon wrote: > > But, as has been pointed out before, code in libc will generate > alignment traps - because it is faster that way. I did not know -- care to give an example? #AC exception from amd64 was not working, so I would guess that this was not used for this port at least. memcpy() between misaligned addresses will do misaligned reads, writes or both. (aligning reads is best on amd cpus) memset() will do an unaligned store to the last word of the buffer prior to filling the rest of the buffer. (IIRC memcpy/memmove do similar.) Any access to a packed structure relies on misaligned transfers. eg: the 'sector number' fields in the disk mbr. in 64bit mode any code that has to match the alignment of 64bit items from 32bit code. Basically, on i386 and amd64 (and other cpus that allow misaligned accesses) if a buffer has to match an external representation that contains misaligned items, then misaligned tranfers are very likely to be used. So although the cpu supports the trap, the os doesn't expect it to be enabled. I see; thanks for the details. That explains why this feature is rarily seen or used on x86. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/tests/lib/libc/gen
Le 23/04/12 09:48, Martin Husemann a écrit : On Sun, Apr 22, 2012 at 08:25:11PM +, Christos Zoulas wrote: There is no portable way to do this; sigbus according to ToG does not define what si_addr contains. I read it differently: The header shall define the siginfo_t type as a structure, which shall include at least the following members: int si_signo Signal number. int si_code Signal code. int si_errno If non-zero, an errno value associated with this signal, as described in. pid_t si_pidSending process ID. uid_t si_uidReal user ID of sending process. void *si_addr Address of faulting instruction. int si_status Exit value or signal. long si_band Band event for SIGPOLL. union sigval si_value Signal value. This is from: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html I think we should modify the test to use RAS_START() and RAS_END() to verify the faulting code address and ignore the associated data address. Careful here; Some Other Operating Systems report data address fault with si_addr + SIGBUS, not instruction fault. -- jym@
Re: CVS commit: src/tests/lib/libc/gen
Le 22/04/12 18:38, David Laight a écrit : > On Sun, Apr 22, 2012 at 02:22:30PM +0100, Jean-Yves Migeon wrote: >> >> Hmm, siginfo(2) states the following: >> >> For SIGBUS and SIGSEGV the siginfo structure contains the >> following additional members: >> >> void *si_addr; >> int si_trap; >> >> si_addr contains the address of the faulting data > > That certainly needs an 'if available' comment. I will add it later. >> and si_trap contains a hardware specific reason. > > But, as has been pointed out before, code in libc will generate > alignment traps - because it is faster that way. I did not know -- care to give an example? #AC exception from amd64 was not working, so I would guess that this was not used for this port at least. -- jym@
Re: CVS commit: src/tests/lib/libc/gen
On Sun, 22 Apr 2012 08:52:26 +, Martin Husemann wrote: Module Name:src Committed By: martin Date: Sun Apr 22 08:52:26 UTC 2012 Modified Files: src/tests/lib/libc/gen: t_siginfo.c Log Message: Do not compare si_addr (address of faulting instruction) against the unaligned data address causing the fault - this will always fail. If anybody knows a portable way to get the data address involved in the fault, please fix the test case as originally intended. Hmm, siginfo(2) states the following: For SIGBUS and SIGSEGV the siginfo structure contains the following addi- tional members: void *si_addr; int si_trap; si_addr contains the address of the faulting data and si_trap contains a hardware specific reason. If the address of the instruction is returned and not the address of the faulting data, either the man page is incorrect, or ambiguous. Concerning x86 (Christoph can perhaps confirm here), I am not aware of any CPU that report the faulty address; rather, %cr2 contains the address of the fault that happened just before, typically a page fault. IMHO this is one of the reason why the mechanism never got much traction under x86: it's unreliable. Reliably obtaining the address would require to fetch eip/rip and decode the faulty opcode. Not impossible, but do we want that? Regarding other architectures, I do not know. I would expect the si_addr to contain the address of the unaligned data access, not the one from the faulting instruction. SIGBUS is used, not SIGILL. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/amd64/amd64
Le 21/04/12 23:25, Christos Zoulas a écrit : In article<4f930a8c.6040...@free.fr>, Jean-Yves Migeon wrote: Le 21/04/12 20:52, Christos Zoulas a écrit : Module Name:src Committed By: christos Date: Sat Apr 21 18:52:37 UTC 2012 Modified Files: src/sys/arch/amd64/amd64: vector.S Log Message: Alignment fault traps push the error code automatically, so don't use ZTRAP! Meh, the fix was awaiting Paul testing... Alright, so I guess this one is right. Even if Paul's testing discovered that the fix did not work for the emulator, wouldn't you commit it so that at least things work on real hardware? It's the other way around; the bug was rather harmless in VMs (kills the process with a SIGILL), while it force-reboot the host on a native platform. I could not know that the fix works on real hardware, that's why I was waiting for Paul's response. Do you want me to ask for a pull-up? Sure, thanks. Will do. -- jym@
Re: CVS commit: src/sys/arch/amd64/amd64
Le 21/04/12 20:52, Christos Zoulas a écrit : > Module Name: src > Committed By: christos > Date: Sat Apr 21 18:52:37 UTC 2012 > > Modified Files: >src/sys/arch/amd64/amd64: vector.S > > Log Message: > Alignment fault traps push the error code automatically, so don't use ZTRAP! Meh, the fix was awaiting Paul testing... Alright, so I guess this one is right. Do you want me to ask for a pull-up? -- jym@
Re: CVS commit: src/tests/lib/libc/gen (address alignment)
Le 21/04/12 19:47, Christoph Egger a écrit : >> rip 0x0 and rsp 0x50202 look really abnormal to me. I'll have a look in >> FreeBSD, that's probably a group of exceptions that have to be handled >> differently. > > rip 0x0 often means that a function pointer has been called which is NULL. > > Christoph Yep, but the bug seems to be a displaced stack here; the information is pushed correctly, but with an offset. Looking at FreeBSD interrupt code, some exceptions have the tf_err value already pushed by the CPU, so no need to do it twice. I have sent a small patch to Paul for testing, it fixes the bug in my VM. Hope that this fixes the bug natively too. -- jym@
Re: CVS commit: src/tests/lib/libc/gen (address alignment)
Le 21/04/12 16:31, Jean-Yves Migeon a écrit : Okay, thanks for the report. So this rules out Virtual Box, it seems to happen on native amd64 too. I am taking a look right now. This seems to be a bug in the trap handling code. The signal is caught correctly (it reaches T_ALIGNFLT|T_USER in trap()), but things blow up just after: we end signalling the process with a SIGILL (which does not come from trap()). Using 32 bits compat mode (cc -m 32) also causes the crash. So something in e_trapsignal() or userret() goes wrong. Still digging. Interesting. The "uncatchable" SIGILL comes from sendsig_siginfo [1] and is used in a way that the signal will force-exit the program (when the copyout fails). It's a "feature" I was not aware of. Anyway, upon entering trap() for this exception (trap11), it looks like the trapframe is a mess: pid 470 (malign): BUS/SEGV (107) at rip 0 addr 7f7ff7701020 rip 0x0 rsp 0x50202 rfl 0x1f rdi 0x7f7ff7701080 rsi 0x8 rdx 0xf7701078 rcx 0x0 r8 0x7 r9 0x7f7ff770178 [...] rip 0x0 and rsp 0x50202 look really abnormal to me. I'll have a look in FreeBSD, that's probably a group of exceptions that have to be handled differently. Given the stack dump above, I tend to believe that #AC was never really stressed. [1] http://nxr.netbsd.org/xref/src/sys/arch/amd64/amd64/machdep.c#695 -- jym@
Re: CVS commit: src/tests/lib/libc/gen (address alignment)
Le 21/04/12 14:50, Jean-Yves Migeon a écrit : The machine did not drop into ddb, it simply rebooted. Unfortunately it did not leave a core dump behind, so I don't have much to look at just yet. When I get home later today, I will try to get more info. BTW, this occurred while running the ATF test from a non-privileged user, so if there's a bug lurking in these recent changes, it could be considered to be a security vulnerability - non-priv user should not be able to crash the box... :) Okay, thanks for the report. So this rules out Virtual Box, it seems to happen on native amd64 too. I am taking a look right now. This seems to be a bug in the trap handling code. The signal is caught correctly (it reaches T_ALIGNFLT|T_USER in trap()), but things blow up just after: we end signalling the process with a SIGILL (which does not come from trap()). Using 32 bits compat mode (cc -m 32) also causes the crash. So something in e_trapsignal() or userret() goes wrong. Still digging. -- jym@
Re: CVS commit: othersrc/share/examples/ec2
Le 20/04/12 23:01, Jeff Rizzo a écrit : Module Name:othersrc Committed By: riz Date: Fri Apr 20 21:01:03 UTC 2012 Added Files: othersrc/share/examples/ec2: README TODO build_ec2_img.sh create-new.sh create_ec2_ami.sh set_ec2_env.sh othersrc/share/examples/ec2/files: bootstrap.sh ec2_bootmail ec2_firstboot ec2_init motd spec.in Log Message: Example scripts and supporting files for creating Amazon EC2 AMIs (Amazon Machine Images) running NetBSD. Requires an Amazon EC2 account. Please test and extend these scripts! The original scripts were written by Jean-Yves Migeon, heavily modified Thanks for your work on this, riz. Heavily appreciated. -- jym@
Re: CVS commit: src/tests/lib/libc/gen (address alignment)
On Fri, 20 Apr 2012 15:33:05 -0700 (PDT), Paul Goyette wrote: On Fri, 20 Apr 2012, Paul Goyette wrote: XXX I would appreciate if someone could test it under a real amd64 host with an up-to-date kernel, so I can reasonably assume that the culprit is Virtual Box and not our amd64 port (my test machine being off line I cannot do it myself). Results from other arches would be a plus too. I just updated one of my machines (running a bulldozer FX-8150) to -current from an hour ago (with the most up-to-date rev of psl.h) and ran atf-run on the libc/gen tests. The machine hung solid at this point: ... t_siginfo (28/32): 8 test cases sigalarm: [1.006950s] Passed. sigbus_adraln: and I am no longer able to telnet into the box from elsewhere. I am not physically at the machine, so I cannot tell if it has dropped into ddb. The machine did not drop into ddb, it simply rebooted. Unfortunately it did not leave a core dump behind, so I don't have much to look at just yet. When I get home later today, I will try to get more info. BTW, this occurred while running the ATF test from a non-privileged user, so if there's a bug lurking in these recent changes, it could be considered to be a security vulnerability - non-priv user should not be able to crash the box... :) Okay, thanks for the report. So this rules out Virtual Box, it seems to happen on native amd64 too. I am taking a look right now. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src or a tale on NetBSD/usermode
o plant a "int 0x80" in a MMAP_NOSYSCALLS >> executable region, just make it to a "call __syscall". At the expense of a >> few more arguments, you will get the same result. > > It depends on the implementation. Do you f.e. allow the linkage of this code > to functions outside a designated list, or outside a designated area? If it > manages to find __syscall by itself in its host program and patch up a direct > call to that constant then yes it could call the OS. Static linking and > strip(1) is your friend then. > > In NetBSD/usermode it would then still only be > able to call the NetBSD/usermode kernel and not the host kernel. I think that this needs clarification. Please correct my mistakes: - NetBSD/usermode kernel is started as a userland process, and uses POSIX API to setup its environment. - it then proceeds to setting the userland memory regions with MMAP_NOSYSCALLS flags, so userland cannot make direct syscalls to host kernel - passes execution to init(1) and userland - all userland code making direct syscalls, this raises a SIGILL each time which gives a chance to the usermode kernel to handle the userland syscalls. Right? So how can you implement the MMAP_NOSYSCALLS step on other POSIXy systems? Merry Christmas to you (and everyone else too) :) -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src
On Wed, 21 Dec 2011 16:47:49 +0100, Reinoud Zandijk wrote: The patch is written to allow for multiple non-UVM flags to be attached to mappings and allow the kernel to react on them. NetBSD/usermode uses this to disallow system calls to be made from within mapped regions and get them returned as illegal instructions so it can analyse and emulate the system calls. To prevent every process to be scrutinized this way a process flag has been introduced to mark if a process needs this check since the detection involve acuiring a lock to walk the uvm map. Why make this a memory-level property, and not a process-level property? If you want to proxy syscalls between host and usermode kernel, why make it exclusive to certain mem regions? I am probably missing something with the way usermode processes, usermode kernel host kernel interact. On the enhancing security argument, malicious source code could trigger compiler bugs that allow for code to be modified or otherwise manipulated to issue system calls where they shouldn't. Although it wouldn't nessiarily pose a system security issue, it could be used for extracting info or for malicious behaviour where with the patch it would simply bomb out. That's the part I have trouble with. It looks like a weaker form of W^X (or PaX's mprotect), and I can't see the "additional" security benefits. Malicious code is free to trigger compiler bugs that can make calls to valid memory areas. If you manage to plant a "int 0x80" in a MMAP_NOSYSCALLS executable region, just make it to a "call __syscall". At the expense of a few more arguments, you will get the same result. As for the panic in sys_mmap(), as pointed out by Joerg and David Young, yes, that should return a EOPNOTSUPP or an EINVAL. Panicing is indeed far too crude and i'll change that. Hope this answers most of your questions. Waiting for mines :) -- Jean-Yves Migeon j...@netbsd.org
Re: CVS commit: src
On 20.12.2011 16:39, Reinoud Zandijk wrote: Module Name:src Committed By: reinoud Date: Tue Dec 20 15:39:36 UTC 2011 Modified Files: src/lib/libc/sys: mmap.2 src/sys/sys: mman.h proc.h src/sys/uvm: uvm_extern.h uvm_map.c uvm_mmap.c Log Message: Add a MAP_NOSYSCALLS flag to mmap. This flag prohibits executing of system calls from the mapped region. This can be used for emulation perposed or for extra security in the case of generated code. IMHO, this change should have been discussed first. Can you please elaborate on its usage? I fail to see the point about emulation, and even more so about the alleged extra security where this can be trivially bypassed. Return to libfoo and ROP are quite mainstream techniques these days... -- Jean-Yves Migeon j...@netbsd.org
Re: CVS commit: src/sys
On 04.12.2011 21:07, Alan Barrett wrote: On Sun, 04 Dec 2011, Jean-Yves Migeon wrote: Log Message: Implement the register/deregister/evaluation API for secmodel(9). It allows registration of callbacks that can be used later for cross-secmodel "safe" communication. Where and when was this discussed? See commit log: http://mail-index.netbsd.org/tech-security/2011/11/29/msg000422.html -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 19.11.2011 18:13, Cherry G. Mathew wrote: Module Name:src Committed By: cherry Date: Sat Nov 19 17:13:39 UTC 2011 Modified Files: src/sys/arch/x86/include: cpu.h src/sys/arch/xen/include: hypervisor.h src/sys/arch/xen/x86: hypervisor_machdep.c src/sys/arch/xen/xen: evtchn.c Log Message: [merging from cherry-xenmp] bring in bouyer@'s changes via: http://mail-index.netbsd.org/source-changes/2011/10/22/msg028271.html From the Log: Log Message: Various interrupt fixes, mainly: keep a per-cpu mask of enabled events, and use it to get pending events. A cpu-specific event (all of them at this time) should not be ever masked by another CPU, because it may prevent the target CPU from seeing it (the clock events all fires at once for example). This patch breaks various xenstore/xenbus related functions, like ballooning. Steps to reproduce: - connect to a dom0 system - try ballooning up or down, with: sysctl -w machdep.xen.balloon.target=10 All newly created processes will then stay in tstile, and the sysctl never returns, waiting on "rplq" wait channel. Observed in QEMU, Virtualbox, and my amd64 spare host. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/lib/libutil
On 13.11.2011 23:03, Christos Zoulas wrote: Module Name:src Committed By: christos Date: Sun Nov 13 22:03:34 UTC 2011 Modified Files: src/lib/libutil: Makefile Added Files: src/lib/libutil: getfstypename.3 Log Message: add manual page Just wondering: is there a different rule applicable to man pages for 4-clause vs 2-clause BSD? I occasionally see new man pages written with a 4-clause BSD, however, most newly written code is 2-clause. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On Wed, 9 Nov 2011 07:52:13 +0530, Cherry G. Mathew wrote: The cleanest way would be to share the code between x86 and Xen, keep the allocation "below 4GiB" boundary for both, and use it everywhere in the pmap code. Only the manipulation of the vcpu_guest_context_t ctrlregs members would have to force this use. Fair enough. Although the <4G tests would be a bit deceptive (since they're cosmetic) - I guess leaving a note in the code about the rationale behind this will help. Well, the cpu_info struct will be shared by all implementations of x86 eventually (at least for PAE & !PAE, native & Xen). So having lots of #ifdef XEN/#ifdef PAE means more work for me later on :) - are you sure that these have to be "defined(PAE) || defined(__x86_64__)" ? That's a crude way of making pmap_cpu_init_late() do nothing for i386 (non-pae) since i386 doesn't need shadow PMDs. In that case, test against "i386_use_pae" (0 == !PAE, 1 == PAE), and simply return if !PAE. Avoid having loonnng macro blocks with different levels of #ifdef. It's fairly difficult to untangle and unpleasant to read. I agree - this looks better. I always wondered whether i386_use_pae should be set for amd64. Strictly speaking, PAE is enabled when we are running in long mode. I set the sysctl(7) a while ago for NX regression tests in ATF, but not the variable. Maybe I should. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 08.11.2011 14:53, Cherry G. Mathew wrote: On 8 November 2011 15:20, Jean-Yves Migeon wrote: On Tue, 8 Nov 2011 08:49:22 +0530, Cherry G. Mathew wrote: Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type. Unless you can convince me that your code will do the right thing(tm), the PDP for PAE should to be allocated below the 4GiB physical boundary; unless mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce that. Besides, %cr3 is still a 32 bits entity under PAE. Trying to stash a 64 bits paddr_t in %cr3 is a really bad idea. See comments http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588 This doesn't hold true for Xen. See: http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509 Xen does the<4GB translation for the VM by maintaining its own shadow. I just took the easier route ( after initially using the x86 pmap< 4GB cr3 code ), but if you're thinking of a unified a xen/non-xen implementation we'll have to re-think this one. Okay; then all users of cr3 have to be documented (_especially_ the native x86 code), as there at least two places where there's implicit uint64_t => uint32_t truncation: lcr3() and pcb_cr3. A comment above these should be enough. Same goes for having paddr_t for L3 PD in struct cpu_info. hmm... I wonder if it would be "cleaner" to do this within a #ifdef XEN/#endif ? The cleanest way would be to share the code between x86 and Xen, keep the allocation "below 4GiB" boundary for both, and use it everywhere in the pmap code. Only the manipulation of the vcpu_guest_context_t ctrlregs members would have to force this use. Avoiding the use of macros is a big plus; it helps modularity. - are you sure that these have to be "defined(PAE) || defined(__x86_64__)" ? That's a crude way of making pmap_cpu_init_late() do nothing for i386 (non-pae) since i386 doesn't need shadow PMDs. In that case, test against "i386_use_pae" (0 == !PAE, 1 == PAE), and simply return if !PAE. Avoid having loonnng macro blocks with different levels of #ifdef. It's fairly difficult to untangle and unpleasant to read. Macros should only be used to fix compile-time issues (like: symbol missing), or help code readability by reuse. Not for jumping around C code and take shortcuts. This has to be done at execution time rather than compile time. - please use "for (i = 0; i< PTP_LEVELS - 1; i++ ) { ... }". I will have to identify these blocks later when GENERIC will include both PAE and !PAE pmaps. PTP_LEVELS makes it a lot easier. ok, will watchout for it - do you want to do this one yourself ? Please do :) ok, I'll take flak for further breakage ;-P Expect stress testing for MP in the near future :o -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On Tue, 8 Nov 2011 08:49:22 +0530, Cherry G. Mathew wrote: Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type. Unless you can convince me that your code will do the right thing(tm), the PDP for PAE should to be allocated below the 4GiB physical boundary; unless mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce that. Besides, %cr3 is still a 32 bits entity under PAE. Trying to stash a 64 bits paddr_t in %cr3 is a really bad idea. See comments http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588 This doesn't hold true for Xen. See: http://lxr.xensource.com/lxr/source/xen/arch/x86/mm.c#L509 Xen does the <4GB translation for the VM by maintaining its own shadow. I just took the easier route ( after initially using the x86 pmap < 4GB cr3 code ), but if you're thinking of a unified a xen/non-xen implementation we'll have to re-think this one. Okay; then all users of cr3 have to be documented (_especially_ the native x86 code), as there at least two places where there's implicit uint64_t => uint32_t truncation: lcr3() and pcb_cr3. A comment above these should be enough. Same goes for having paddr_t for L3 PD in struct cpu_info. - are you sure that these have to be "defined(PAE) || defined(__x86_64__)" ? - please use "for (i = 0; i < PTP_LEVELS - 1; i++ ) { ... }". I will have to identify these blocks later when GENERIC will include both PAE and !PAE pmaps. PTP_LEVELS makes it a lot easier. ok, will watchout for it - do you want to do this one yourself ? Please do :) -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 06.11.2011 16:18, Cherry G. Mathew wrote: > Module Name: src > Committed By: cherry > Date: Sun Nov 6 15:18:19 UTC 2011 > > Modified Files: >src/sys/arch/amd64/include: pmap.h >src/sys/arch/i386/include: pmap.h >src/sys/arch/x86/include: cpu.h >src/sys/arch/x86/x86: pmap.c >src/sys/arch/xen/x86: cpu.c x86_xpmap.c > > Log Message: > [merging from cherry-xenmp] make pmap_kernel() shadow PMD per-cpu and MP aware. Some comments. > -#ifdef PAE > -/* Address of the static kernel's L2 page */ > -pd_entry_t *pmap_kl2pd; > -paddr_t pmap_kl2paddr; > -#endif > - > - [snip] > #ifdef PAE > - uint32_tci_pae_l3_pdirpa; /* PA of L3 PD */ > + paddr_t ci_pae_l3_pdirpa; /* PA of L3 PD */ >pd_entry_t *ci_pae_l3_pdir; /* VA pointer to L3 PD */ > #endif > [snip] > + > +#if defined(PAE) > + ci->ci_pae_l3_pdir = (paddr_t *)uvm_km_alloc(kernel_map, PAGE_SIZE, 0, > + UVM_KMF_WIRED | UVM_KMF_ZERO | UVM_KMF_NOWAIT); > + > + if (ci->ci_pae_l3_pdir == NULL) { > + panic("%s: failed to allocate L3 per-cpu PD for CPU %d\n", > +__func__, cpu_index(ci)); > + } Please keep ci_pae_l3_pdir to a uint32_t and back out its paddr_t type. Unless you can convince me that your code will do the right thing(tm), the PDP for PAE should to be allocated below the 4GiB physical boundary; unless mistaken, the uvm_km_alloc(kernel_map, ...) does not enforce that. Besides, %cr3 is still a 32 bits entity under PAE. Trying to stash a 64 bits paddr_t in %cr3 is a really bad idea. See comments http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#1588 > -/* flag to be used for kernel mappings: PG_u on Xen/amd64, 0 otherwise */ > -#if defined(XEN)&& defined(__x86_64__) > -#define PG_k PG_u > -#else > -#define PG_k 0 > -#endif > - Are you sure that all the mapping sites are safe (PT/PD bits), given the pmap split between pmap/xen_xpmap.c? On a side note, please stick to KNF especially for 80 columns. [snip] > +void > +pmap_cpu_init_late(struct cpu_info *ci) > +{ > +#if defined(PAE) || defined(__x86_64__) > + /* > + * The BP has already its own PD page allocated during early > + * MD startup. > + */ > + > + if (ci ==&cpu_info_primary) > + return; > + > + KASSERT(ci != NULL); > + ci->ci_pae_l3_pdirpa = vtophys((vaddr_t) ci->ci_pae_l3_pdir); > + KASSERT(ci->ci_pae_l3_pdirpa != 0); > + > + /* Initialise L2 entries 0 - 2: Point them to pmap_kernel() */ > + ci->ci_pae_l3_pdir[0] = > + xpmap_ptom_masked(pmap_kernel()->pm_pdirpa[0]) | PG_V; > + ci->ci_pae_l3_pdir[1] = > + xpmap_ptom_masked(pmap_kernel()->pm_pdirpa[1]) | PG_V; > + ci->ci_pae_l3_pdir[2] = > + xpmap_ptom_masked(pmap_kernel()->pm_pdirpa[2]) | PG_V; > +#endif /* PAE */ - are you sure that these have to be "defined(PAE) || defined(__x86_64__)" ? - please use "for (i = 0; i < PTP_LEVELS - 1; i++ ) { ... }". I will have to identify these blocks later when GENERIC will include both PAE and !PAE pmaps. PTP_LEVELS makes it a lot easier. That's all for the first quick review :) Thanks for starting the merge of your branch. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/kern
On 24.10.2011 03:53, YAMAMOTO Takashi wrote: Log Message: Turn a workqueue(9) name into an array in the struct workqueue, rather than a const char *. This avoids keeping a reference to a string owned by caller (string could be allocated on stack). what needs it? Me, in a not-so-distant future. The purpose of this change is to allow passing strings that could be constructed that way: char name[16]; ... snprintf(name, sizeof(name), "xbdback%d.%d", domid, instance); workqueue_create(&foo, name, funcfoo, NULL, PRI_NONE, SPL_BIO, 0); I'd like to construct workqueues based on a context; the old code does not permit that, unless you build and store the string somewhere, forcing the caller to *know* that it only keeps a pointer and does not copy the content. This will get misused, sooner or later. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src
On 29.09.2011 22:05, Alexander Nasonov wrote: Jukka Ruohonen wrote: What are the options to handle cases like this? While looking for examples, I also noticed few bugs where the possibly blocking kmem_free(9) is used while holding a mutex (e.g. ras_purgeall() in kern_ras.c). Disclaimer: I'm a complete newbie in this area. Blocking kmem is not necessarily a bug is you hold an adaptive lock. Your mutex is IPL_NONE, if I remember correctly. You probably don't need to be smart. I am not sure that this lock will be released if the LWP needs to sleep for the allocation. Which means that the lock will block other LWPs from running even when they do not need to allocate memory. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 21.09.2011 00:23, Matt Thomas wrote: XXX PAE suspend does not work in amd64 currently, due to (yet again!) page validation issues with hypervisor. Will fix. Got it, pool cache invalidation is not working as expected during save. I remember discussing this matter with Mindaugas a while back, and due to pool_cache(9) limitations back then the code was commented out (see the #if 0 ... #endif part in pool_cache_invalidate). I had to implement a pool_cache_invalidate_local(), but this wasn't technically appreciated by many as it made pool_cache abstraction leaky (and they were right) I'll discuss this matter again, I think I have an alternative solution for that one. I may hit a pmf(9) limitation though :/ Maybe add pmf hook for cpu device_t in mi_cpu_attach That's the idea; that way I could write initial code for vcpu suspend when Xen MP gets committed. Unfortunately, the limitation I currently have is for invalidation: only the CPU can invalidate its own caches, so I have to teach the pmf hook to somehow execute the "pool_cache(9) deplete" on the right CPU when pmf_system_suspend is called, or use xcall(9). rmind@ implemented high priority xcall(9) some time ago, so IMHO the pool_cache(9) invalidation should now be able to handle all kind of pools situations, including high IPL pools. But I'll have to check this one first before uncommenting the xcall part in pool_cache_invalidate(). There's already MD code that tried to circumvent the invalidation limitation (eg. not handling per-CPU pools, like x86 pmap_create "try_again" loop). -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 20.09.2011 02:12, Jean-Yves Migeon wrote: Module Name:src Committed By: jym Date: Tue Sep 20 00:12:25 UTC 2011 Modified Files: [snip] Log Message: Merge jym-xensuspend branch in -current. ok bouyer@. Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64. [snip] XXX PAE suspend does not work in amd64 currently, due to (yet again!) page validation issues with hypervisor. Will fix. Got it, pool cache invalidation is not working as expected during save. I remember discussing this matter with Mindaugas a while back, and due to pool_cache(9) limitations back then the code was commented out (see the #if 0 ... #endif part in pool_cache_invalidate). I had to implement a pool_cache_invalidate_local(), but this wasn't technically appreciated by many as it made pool_cache abstraction leaky (and they were right) I'll discuss this matter again, I think I have an alternative solution for that one. I may hit a pmf(9) limitation though :/ Anyway, for the reckless people that already tried my code and reported breaks: yes, I am aware of that specific issue :) -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/etc
On 06.09.2011 21:29, Thomas Klausner wrote: >> If the source ('-s' of postinstall(8)) is not specified at command line, >> according to code it takes "/usr/src" as default. Do you have a >> stale/old rc.conf(5) file in there, /usr/src/etc/defaults/rc.conf? > > I have the CVS checkout there that was used to build the snapshot I > installed, so yes there is a file there but no it is not stale. > >> Now, if this case should be handled properly is another question. There >> are multiple solutions, all of them end up tweaking postinstall(8): >> - raise a warning when overriding certain configuration files with older >> versions (perhaps by using the CVS tags?) > > It's not an older version. > >> - make source an explicit arg to etcupdate/postinstall >> - something else? > > Or perhaps prefer ./etc to /usr/src/etc, that would solve the issue > for me at least :) postinstall(8) and etcupdate(8) all take /usr/src/ as default, so I won't change that. The problem is that rc.conf is now generated at build time, and I am always using postinstall(8) via -s /usr/cvs/dest (my default DESTDIR), so this went unnoticed during my tests as the correct /etc/defaults gets parsed. I'll add the required code to handle this; thanks for reporting! -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/etc
On 06.09.2011 12:54, Thomas Klausner wrote: > On Mon, Aug 22, 2011 at 06:54:07PM +0000, Jean-Yves Migeon wrote: >> Module Name: src >> Committed By:jym >> Date:Mon Aug 22 18:54:06 UTC 2011 >> >> [snip] >> >> Log Message: >> Modify etc/defaults/Makefile so that architectures can specify an additional >> rc.conf file. This one should reside under etc/etc.${MACHINE}/, and will >> get automatically appended to etc/defaults/rc.conf at build time if present. >> >>[snip] >> >> See also >> http://mail-index.netbsd.org/tech-userlevel/2011/07/25/msg005246.html > > In a snapshot from a few hours ago I installed, "postinstall fix" > without any other arguments removed the architecture specific section > (the rc.conf included in the build output directory is fine). > Thomas Hmmm, /etc/defaults/rc.conf should be transparent to postinstall(8), as it gets installed via during build.sh. If the source ('-s' of postinstall(8)) is not specified at command line, according to code it takes "/usr/src" as default. Do you have a stale/old rc.conf(5) file in there, /usr/src/etc/defaults/rc.conf? Now, if this case should be handled properly is another question. There are multiple solutions, all of them end up tweaking postinstall(8): - raise a warning when overriding certain configuration files with older versions (perhaps by using the CVS tags?) - make source an explicit arg to etcupdate/postinstall - something else? -- Jean-Yves Migeon j...@netbsd.org
Re: CVS commit: [cherry-xenmp] src/sys/arch
On 30.08.2011 14:53, Cherry G. Mathew wrote: > Module Name: src > Committed By: cherry > Date: Tue Aug 30 12:53:46 UTC 2011 > > Modified Files: > src/sys/arch/i386/i386 [cherry-xenmp]: machdep.c > src/sys/arch/xen/x86 [cherry-xenmp]: cpu.c x86_xpmap.c > > Log Message: > Add per-cpu mmu queues Thanks for looking into it! -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/share/man/man4
On 29.08.2011 22:27, Manuel Bouyer wrote: > On Mon, Aug 29, 2011 at 03:17:16PM +0100, Jean-Yves Migeon wrote: >> Both; usually I am using the VGA-emulated display, but sometimes >> (like I did lately) I switch to console, so I can keep a reasonable >> amount of history of the dom0 output in my terminal. > > The the dom0 kernel is using xencons, and the ddb sequence is '+'. > See the cn_set_magic() call in xencons.c Yes, I can confirm now. I'll update ddb(4) to reflect this. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen
On 29.08.2011 15:03, Cherry G. Mathew wrote: > I'm a bit confused now - are we assuming that the pmap lock protects the > (pte update op queue-push(es) + pmap_pte_flush()) as a single atomic > operation (ie; no invlpg/tlbflush or out-of-band-read can occur between > the update(s) and the pmap_pte_flush()) ? > > If so, I think I've slightly misunderstood the scope of the mmu queue > design - I assumed that the queue is longer-lived, and the sync points > (for the queue flush) can span across pmap locking - a sort of lazy pte > update, with the queue being flushed at out-of-band or in-band read > time ( I guess that won't work though - how does one know when the > hardware walks the page table ?) . It seems that the queue is meant for > pte updates in loops, for eg:, quickly followed by a flush. Is this > correct ? IMHO, this should be regarded this way, and nothing else. x86 and xen pmap(9) share a lot in common, low level operations (like these: PT/PD editing, TLB flushes, MMU updates...) should not leak through this abstraction. Said differently, the way Xen handles MMU must remain transparent to pmap, except in a few places. Remember, although we are adding a level of indirection through hypervisor, the calls should remain close to native x86 semantic. However, for convenience, Xen offers "multiseats" MMU hypercalls, where you can schedule more than one op at a time to avoid unneeded context switches, like in the pmap_alloc_level() function. This is our problematic part. > If so, there's just one hazard afaict - the synchronous TLB_FLUSH_MULTI > could beat the race between the queue update and the queue flush via > pmap_tlb_shootnow() (see pmap_tlb.c on the cherry-xenmp branch, and *if* > other CPUs reload their TLBs before the flush, they'll have stale info. What stale info? If a VCPU queue isn't empty while another VCPU has scheduled a TLB_FLUSH_MULTI op, the stale content of the queue will eventually be depleted later after a pmap_pte_flush() followed by a local invalidation. This is the part that should be carefully reviewed. For clarity, I would expect a queue to be empty when leaving pmap (e.g. when releasing its lock). Assertions might be necessary to catch all corner cases. > So the important question (rmind@ ?) is, is pmap_tlb_shootnow() > guaranteed to be always called with the pmap lock held ? > > In real life, I removed the global xpq_queue_lock() and the pmap was > falling apart. So a bit of debugging ahead. Did you keep the same queue for all CPUs? If that was the case, yes, this is a call for trouble. Anyway, we can't put a giant lock around xpq_queue. This doesn't make any sense in a MP system, especially for operations that are frequently called and still may take a while to complete. Just imagine: all CPUs waiting for one to finish its TLB flush before they can edit their PD/PT again... ouch. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/share/man/man4
On Mon, 29 Aug 2011 15:25:03 +0200, Manuel Bouyer wrote: On Mon, Aug 29, 2011 at 12:45:27PM +0100, Jean-Yves Migeon wrote: Hmm, I'll have to cross-check that one this afternoon. IIRC, I am also using the the default "break" command when I am running the dom0 inside QEMU, and '+' is only used when I want to break in the domU (through xencons(4)). But when running inside qemu, you're using qemu's VGA display, not its serial port, right ? Both; usually I am using the VGA-emulated display, but sometimes (like I did lately) I switch to console, so I can keep a reasonable amount of history of the dom0 output in my terminal. in this case I guess the dom0 is using wscons, not the xencons. xencons is (should) be used by dom0 only on serial console. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/share/man/man4
On Mon, 29 Aug 2011 11:46:06 +0200, Manuel Bouyer wrote: On Mon, Aug 29, 2011 at 09:01:47AM +0200, Jean-Yves Migeon wrote: What kind of console is attaching for you in dom0? I can't see how '+' would get wired in by default. At least when either started from bare metal, or QEMU. '+' is used for serial console (or, to be more precise, xencons which is what is used by dom0 when the hypervisor is configured to use serial console), where break won't work because the hypervisor doesn't forward it to the dom0. Hmm, I'll have to cross-check that one this afternoon. IIRC, I am also using the the default "break" command when I am running the dom0 inside QEMU, and '+' is only used when I want to break in the domU (through xencons(4)). In that case, I'll have to make adjustments. Having the same key sequence for both dom0 and domU is... problematic. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/share/man/man4
On 29.08.2011 06:30, Christoph Egger wrote: > On 29.08.11 06:26, Christoph Egger wrote: >> On 29.08.11 00:09, Jean-Yves Migeon wrote: >>> Module Name:src >>> Committed By: jym >>> Date: Sun Aug 28 22:09:37 UTC 2011 >>> >>> Modified Files: >>> src/share/man/man4: ddb.4 >>> >>> Log Message: >>> Be more precise for Xen: + is only valid for Xen domUs. dom0 uses >>> the same key sequences as i386/amd64. >> >> + also works for me in a dom0. > > I think I have to mention I am using serial console regularly. What kind of console is attaching for you in dom0? I can't see how '+' would get wired in by default. At least when either started from bare metal, or QEMU. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/usermode/usermode
On 27.08.2011 20:28, Joerg Sonnenberger wrote: > On Sat, Aug 27, 2011 at 08:13:28PM +0200, Jean-Yves Migeon wrote: >> On 27.08.2011 19:57, Reinoud Zandijk wrote: >>> Fix copystring routines to NOT just copy all since not all space might be >>> writable. This can be fixed by implementing/importing strnlen(3) in the >>> kernel >> >> Any reason no to? If there's none, I can do it. >> >> At first sight it's straightforward to add to common/, and I am more at >> peace knowing that we have a "valid" strnlen() in kernel rather than a >> bogus macro that may spread elsewhere... > > Or it could use memchr for that same purpose. Right; note that the question about strnlen(3) remains valid though: strlen/strnlen appear in the same man page. So one's could expect to have both accessible, even in kernel. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/usermode/usermode
On 27.08.2011 19:57, Reinoud Zandijk wrote: > Fix copystring routines to NOT just copy all since not all space might be > writable. This can be fixed by implementing/importing strnlen(3) in the kernel Any reason no to? If there's none, I can do it. At first sight it's straightforward to add to common/, and I am more at peace knowing that we have a "valid" strnlen() in kernel rather than a bogus macro that may spread elsewhere... -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen
On Mon, 22 Aug 2011 12:47:40 +0200, Manuel Bouyer wrote: This is slightly more complicated than it appears. Some of the "ops" in a per-cpu queue may have ordering dependencies with other cpu queues, and I think this would be hard to express trivially. (an example would be a pte update on one queue, and reading the same pte read on another queue - these cases are quite analogous (although completely unrelated) read don't go through the xpq queue, don't they ? Nope, PTE are directly obtained from the recursive mappings (vtopte/kvtopte). Content is "obviously" only writable by hypervisor (so it can keep control of his mapping alone). I think this is similar to a tlb flush but the other way round, I guess we could use a IPI for this too. IIRC that's what the current native x86 code does: it uses an IPI to signal other processors that a shootdown is necessary. I'm thinking that it might be easier and more justifiable to nuke the current queue scheme and implement shadow page tables, which would fit more naturally and efficiently with CAS pte updates, etc. I'm not sure this would completely fis the issue: with shadow page tables you can't use a CAS to assure atomic operation with the hardware TLB, as this is, precisely, a shadow PT and not the one used by hardware. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen
On 21.08.2011 21:39, Cherry G. Mathew wrote: > JM> An alternative would be to have per-CPU xpq_queue[] also. This > JM> is not completely stupid, xpq_queue is meant as a way to put > JM> multiple MMU operations in a queue asynchronously before issuing > JM> only one hypercall to handle them all. > > This is slightly more complicated than it appears. Some of the "ops" in > a per-cpu queue may have ordering dependencies with other cpu queues, > and I think this would be hard to express trivially. (an example would > be a pte update on one queue, and reading the same pte read on another > queue - these cases are quite analogous (although completely unrelated) > to classic RAW and other ordering dependencies in out-of-order execution > scenarios due to pipelining, etc.). Yes; however this should be handled by the pmap layer. xpq_queue is fairly low level and has no access to all the knowledge/semantic required to ensure proper PT/PD synchronization. It is rather a way to handle multiple MMU operations into one hypercall, like readv/writev can do for multiple read/write(s). Mapping different pmaps in the same VA space requires to lock them, so you should not see multiple CPUs concurrently reading/writing shared entries. One thing remains with the old => new PTE updates that need CAS, but this is Xen's mission to do that correctly as it manages the MMU for us. All we have to ensure is that there's no stale operation in the queue before leaving pmap. > I'm thinking that it might be easier and more justifiable to nuke the > current queue scheme and implement shadow page tables, which would fit > more naturally and efficiently with CAS pte updates, etc. Beware with those; shadow page tables were mainly designed with Linux in mind, and the way it manages virtual memory is completely different to ours: it maps the entire physical memory in the kernel virtual space (with tricks when there's more than 1GiB involved), while we use recursive mappings. And Xen has problems validating these. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen
On 21.08.2011 12:26, Jean-Yves Migeon wrote: > - second, the lock is not placed at the correct abstraction level IMHO, > it is way too high in the caller/callee hierarchy. It should remain > hidden from most consumers of the xpq_queue API, and should only be used > to protect the xpq_queue array together with its counters (and > everything that isn't safe for all memory operations happening in xpq). > > Reason behind this is that your lock protects calls to hypervisor MMU > operations, which are hypercalls (hence, a "slow" operation with regard > to kernel). You are serializing lots of memory operations, something > that should not happen from a performance point of view (some may take a > faire amount of cycles to complete, like TLB flushes). I'd expect all > Xen MMU hypercalls to be reentrant. An alternative would be to have per-CPU xpq_queue[] also. This is not completely stupid, xpq_queue is meant as a way to put multiple MMU operations in a queue asynchronously before issuing only one hypercall to handle them all. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen
On 10.08.2011 11:50, Cherry G. Mathew wrote: > Module Name: src > Committed By: cherry > Date: Wed Aug 10 09:50:37 UTC 2011 > > Modified Files: > src/sys/arch/xen/include: xenpmap.h > src/sys/arch/xen/x86: x86_xpmap.c > > Log Message: > Introduce locking primitives for Xen pte operations, and xen helper calls for > MP related MMU ops Hi Cherry, Sorry for this very late comment, as I was AFK most of the time for the last three weeks. I have two concerns with your patch: - first, you introduce locking primitives based on simple_lock(9). This mechanism ought to go in the not-so-distant future, so I would replace all these with a properly chosen mutex, likely at IPL_VM (depending on the priority of all consumers of the xpq_*() functions). This must be carefully reviewed first. - second, the lock is not placed at the correct abstraction level IMHO, it is way too high in the caller/callee hierarchy. It should remain hidden from most consumers of the xpq_queue API, and should only be used to protect the xpq_queue array together with its counters (and everything that isn't safe for all memory operations happening in xpq). Reason behind this is that your lock protects calls to hypervisor MMU operations, which are hypercalls (hence, a "slow" operation with regard to kernel). You are serializing lots of memory operations, something that should not happen from a performance point of view (some may take a faire amount of cycles to complete, like TLB flushes). I'd expect all Xen MMU hypercalls to be reentrant. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys
On 30.07.2011 19:01, Christos Zoulas wrote: > Module Name: src > Committed By: christos > Date: Sat Jul 30 17:01:05 UTC 2011 > > Modified Files: > src/sys/conf: files > src/sys/kern: init_main.c kern_lwp.c kern_synch.c > Added Files: > src/sys/kern: subr_pserialize.c > src/sys/sys: pserialize.h > > Log Message: > Add an implementation of passive serialization as described in expired > US patent 4809168. This is a reader / writer synchronization mechanism, > designed for lock-less read operations. Interesting! What would be the direct consumers of such an implementation? For example, can pserialize be a drop-in replacement for everything that is rwlock(9) based, or are they limitations? (I am aware that passive serialization is not equivalent to RCU). I remember reading articles mentioning how "wonderful" lockless algorithms are, except in situation where the additional bus locking/atomic ops involved did not really improve the situation in highly concurrent systems (and could even make it worse). -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On Mon, 18 Jul 2011 07:41:30 +0100, David Laight wrote: On Mon, Jul 18, 2011 at 02:18:54AM +0200, Jean-Yves Migeon wrote: On 18.07.2011 02:00, David Young wrote: >> Can we please use ansi function definitions in newly committed code? > > This was tedious enough without converting to ANSI function definitions. > A good job for Coccinelle (spatch)? Sadly, no: last time I tried (when moving kvm(3) code to ANSI style), I had to do it manually. It's even the opposite, spatch has issues when parsing non-ANSI declarations, so you have to do the conversion all by yourself first... I've a sed script that will change most of them see ftp.netbsd.org:~dsl/protoz Ah yes, forgot about it. Works pretty well, thanks for pointing it out. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 18.07.2011 02:00, David Young wrote: >> Can we please use ansi function definitions in newly committed code? > > This was tedious enough without converting to ANSI function definitions. > A good job for Coccinelle (spatch)? Sadly, no: last time I tried (when moving kvm(3) code to ANSI style), I had to do it manually. It's even the opposite, spatch has issues when parsing non-ANSI declarations, so you have to do the conversion all by yourself first... -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64
On 17.07.2011 23:18, David Laight wrote: >>> The .byte streams are required for the inclusion of the AES NI >>> instructions, which are not supported with our current gcc version. >>> >>> Should be fixed once we have stabilized gcc 4.5 (dunno about other >>> compilers though, especially pcc). >> >> That doesn't make any sense. This are *assembler* instructions, not GCC >> intrinsics. nm, sorry; was thinking about gas, not gcc. > Also, having looked at the file, even if it is using instructions that > the assembler can't process, it is a horrid mess. > There are much better ways to specify instructions than just .byte sequences. > Even if you aren't using CPP, the assmembler will support local constants > and expressions. > Even a few comments would help. IIRC, this is the code as generated by the Perl scripts in openssl (byte streams and the resulting ugliness are neither my own nor spz@). I tend to steer away from manipulating code (particularly crypto) when I don't have good knowledge of it. And this is far from being the case for me with OpenSSL. Anyway, I'll look into it next week for cleanup. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64
On 17.07.2011 21:48, Joerg Sonnenberger wrote: > Module Name: src > Committed By: joerg > Date: Sun Jul 17 19:48:31 UTC 2011 > > Modified Files: > src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64: aes.inc > > Log Message: > Disable Clang's integrated assembler for the AES-NI files for now. > Somewhere in this mess of .byte streams, corruption happens. Disassembly > only shows slightly different filling of alignment sequences, further > analysis is needed. > > XXX This should be rewritten to be proper assembler code The .byte streams are required for the inclusion of the AES NI instructions, which are not supported with our current gcc version. Should be fixed once we have stabilized gcc 4.5 (dunno about other compilers though, especially pcc). -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/usr.bin/pmap
On 24.06.2011 00:50, Christos Zoulas wrote: > Module Name: src > Committed By: christos > Date: Thu Jun 23 22:50:54 UTC 2011 > > Modified Files: > src/usr.bin/pmap: main.c > > Log Message: > Don't give out information about processes we can't control. Thanks to Aleksey and you for fixing the procfs leak. I wonder whether pmap's code is the right place to check for "information" access control. It's difficult to modify except by patching the source, does not protect from abusing/finding exploits to circumvent the check (any executable that has kmem sgid rights is a target), and there are other potential tools usable out there (lsof(1), maybe?). Isn't it something that rather fits the kauth(9) ACLs? -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen/xen
On 15.05.2011 21:11, Mindaugas Rasiukevicius wrote: > This is not correct. Atomic op might decrease the reference count to > 1 while other thread executes xbdi_put() before xbdi_refcnt is fetched, > thus decreasing it to 0. In such case, both threads would fetch 0. > > The following is a correct way: > > if (atomic_dec_uint_nv(&xbdip->xbdi_refcnt) == 0) > xbdback_finish_disconnect(xbdip); > > Also, it seems there is no need for xbdi_refcnt to be volatile. Good point; I was pondering about the _nv() version when I made the change, but forgot about the possible concurrency regarding refcnt fetch (not actually possible as port-xen is not MP, but will become soonish) I'll remove the volatile declaration too, only xbdi_put/_get use the refcnt for G/C anyway. Thanks for pointing that out! -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen/xen
On 18.04.2011 10:35, Mindaugas Rasiukevicius wrote: > We used to check the return of big size allocations, when kmem(9) could > fail even with KM_SLEEP due to KVA starvation. However, pretty much all > kernel does not perform checks for smaller allocations and since the bug > was fixed - we are no longer checking for big ones as well. > > This applies to all allocators. So, any thread sleeping for an allocation cannot be interrupted? -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen/xen
On 18.04.2011 05:04, Mindaugas Rasiukevicius wrote: > Module Name: src > Committed By: rmind > Date: Mon Apr 18 03:04:31 UTC 2011 > > Modified Files: > src/sys/arch/xen/xen: balloon.c > > Log Message: > balloon_xenbus_attach: use KM_SLEEP for allocation. > > Note: please do not use KM_NOSLEEP. Ah yes, forgot about this one, thanks. Although I am still unsure about the check, in-kernel NULL deref is... problematic. I am not so sure whether it is safe to assume non-NULL return if caller can sleep. It's something that ought to be specified for all available memoryallocators(9) especially as the code behind can evolve (hey, Lars :) ). -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen/xen
On 18.04.2011 09:23, Cherry G. Mathew wrote: > On 18 April 2011 08:00, Cherry G. Mathew wrote: >> On 18 April 2011 04:21, Mindaugas Rasiukevicius wrote: >>> Masao Uebayashi wrote: >>>> On Mon, Apr 18, 2011 at 03:04:32AM +, Mindaugas Rasiukevicius wrote: >>>>> Module Name:src >>>>> Committed By: rmind >>>>> Date: Mon Apr 18 03:04:31 UTC 2011 >>>>> >>>>> Modified Files: >>>>> src/sys/arch/xen/xen: balloon.c >>>>> >>>>> Log Message: >>>>> balloon_xenbus_attach: use KM_SLEEP for allocation. >>>>> >>>>> Note: please do not use KM_NOSLEEP. >>>> >>>> And, according to yamt@, KM_SLEEP can fail in the current design... >>> >>> IIRC yamt@ fixed it a year or few ago. >> >> And in the more specific immediate context: >> http://mail-index.netbsd.org/port-xen/2011/04/07/msg006613.html >> > PS: Can you please revert ? Unless it's breaking anything else, or you > have a fix for the problem mentioned in the thread above, ie; Uho, I forgot to mention in my commit log that I fixed it. I am allocating bpg_entries via pool_cache(9), and the constructor bpge_ctor() will return an error if uvm(9) fails to find a free page. In that case, the thread will just bail out and start waiting again. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen/xen
On 06.04.2011 20:01, Manuel Bouyer wrote: > You could also use > xvifxiy (e.g. xvif5i2, where i stands for 'index'). > or any other letter ... Huh hmm indeed... I wonder why I did not think about this approach before... -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen/xen
On Wed, 6 Apr 2011 16:29:22 +, Taylor R Campbell wrote: Date: Mon, 04 Apr 2011 23:46:19 +0200 From: Jean-Yves Migeon The newer scripts for Xen read the interface value from the vifname entry in Xenstore, so changing the name is not that critical. But this should be stabilized sooner than later. Personally, I find '_' rather ugly in an interface name... but I am open to suggestions, even fixing rc.d/network if it needs to. One could declare that ifnames must match [a-z][-a-z0-9]*[0-9], and then in /etc/rc.d/network pass them through `tr - _' before evaluating `args=\$ifconfig_$int'. This guarantees that setting ifconfig_ifN in /etc/rc.conf always works, and depends only on agreeing to the ifname policy and ensuring that all the existing ifnames actually follow it. Of course, tr might not be available yet in /etc/rc.d/network, but someone wizardlier (or more patient) with sh than I can probably work around that... Yep, but there are other places where this will get tricky. For example, rc.conf(5) is parsed as a classic shell script [1]. Given that someone has two possibilities to configure interfaces: /etc/ifconfig.xxx or ifconfig_xxx # in rc.conf(5) If we follow the same "xxx" naming convention for both, we would have to 'tr' certain lines of rc.conf for interfaces that contains a [^a-zA-Z0-9_] char. This complexifies rc.conf parsing for no real benefit. Putting a ifconfig_xxx-yyy="auto" in rc.conf(5) will likely result in a parsing error. I am about to revert my patch anyway, and use '_' instead. I will also update ifconfig.if(5) to clearly indicate that only chars accepted in shell variable names should be used. [1] http://nxr.netbsd.org/xref/src/etc/rc#22 -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/xen/xen
On 04.04.2011 23:21, Taylor R Campbell wrote: >Date: Sun, 3 Apr 2011 23:21:39 + >From: "Jean-Yves Migeon" > >Now that pkgsrc-2011Q1 has arrived, and before -6 chimes in, change >ifxname for xvif(4) from xvif%d.%d to xvif%d-%d. This is needed >to avoid sysctl(9) EINVAL errors when creating interface nodes. > > This change came awfully close to fixing PR misc/39376 too... > > (Hyphens may be valid in sysctl nodes while dots are not, but neither > hyphens nor dots are valid in sh variables, for /etc/rc.d/network's > ifconfig_ifN.) Good catch; I missed it... The newer scripts for Xen read the interface value from the vifname entry in Xenstore, so changing the name is not that critical. But this should be stabilized sooner than later. Personally, I find '_' rather ugly in an interface name... but I am open to suggestions, even fixing rc.d/network if it needs to. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/i386/conf
On Wed, 23 Feb 2011 12:17:43 +0900, Masao Uebayashi wrote: IMHO, for x86, the kernel cannot assume that the bootloader loaded certain modules, nor can the bootloader expect that kernel XYZ is in a specific configuration. They should be agnostic from one another. I think that TRT here is that kernel tells bootloader what to load. This should be possible by allocating a static buffer shared by bootloader/kernel, and kernel does reboot (== calling bootloader). There are, at least, two non trivial points to solve here: - as said, for x86, you are pushing for an interface that does not yet exist between kernel and bootloader. I highly doubt that Grub/Grub2, syslinux, or any other bootloader not under direct control of TNF, will follow such a model. - you have to "configure", but also to "unconfigure" device upon each reboot. You have to teach the interface not only "what to load", but also, "what is the state" of a driver module. Module's loading can change the state of devices, and rebooting/calling bootloader will _not_ reset that state. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/amd64/conf
recompiling half a dozen files actually affected by the change. > > This is a non-starter; furthermore, in such an environment if I > > screw up with the versions I can easily render my test machine > > unbootable. > > 100% agreed. Regular problem with Xen kernels. Glad at least some things can be agreed on :-) Good, so we are making progress :) > MONOLITHIC was added in a rush to shut up complaints, because people > couldn't make a difference between these types: > 1 - MONOLITHIC kernel, where options are compiled in (_provided_ you > enabled them), with module support disabled > 2 - MODULAR kernel, with most common options(7) compiled as "builtin" > modules (essentially making it like 1, except "options MODULAR") > 3 - MODULAR kernel, with most options(7) compiled as third party modules. Well right, except that case 2 isn't really a modular kernel. The existence of "options MODULAR" is not the issue that matters (mostly, modulo the autoload issues) -- it's case 3 that causes serious complications. Even without a fix for the autoloading issues, I don't think anyone really objects much to having "options MODULAR" turned on. Anyone who's setting up their own kernel config to disable things and doesn't want the autoloader providing them anyway can turn off "options MODULAR" at the same time they turn off whatever else they're concerned about. (Knowing that they need to do this isn't desirable and the autoloading issues should be resolved, but it's not critical at the same level.) > Please stop opposing MONOLITHIC and MODULAR. Given the current > architecture, modules can be turned into builtins, which is, in essence, > the same thing as a monolithic kernel, when you turn on most options. I do not understand what you mean by this. That if you want a MONOLITHIC kernel, options MODULAR does not oppose that. The only exception being that, as you said above, commenting out an option won't remove it, but rather move it to a filesystem module. The premise of your commit was that GENERIC should be modular, that is, you took out a large number of standard features so they'd be only available via moduldes. Any MONOLITHIC config should have those features compiled in. I removed file-systems nobody did not seem to care about when externalized as modules (after sending a notice to the relative ML). Attempt is made to add modules to i386 GENERIC, and remove some from amd64, so they become closer in concept. If you want me to revert that part, and reflect it in i386 GENERIC (essentially turning most modules as builtins), I am fine with that... Bu after receiving a buckloads of mails with "no need for accf_* , nor CODA" or "please leave file-systems as builtins", well, I ended there [1], and asked core@ what I should leave as builtins, and what I can turn into modules (and mark them so in config(5), so I could come back later if config(5) gets extended -- as you proposed). There shouldn't be any difference between a compiled-in feature and the same feature present as a "builtin module". > I also stated (also privately with Matthew) that "MONOLITHIC" was a bad > name. "NOMODULAR" would be more appropriate. If we enable most options > in GENERIC, MONOLITHIC would just be a "include <...>/GENERIC\n no > options MODULAR", with almost zero difference regarding code (well, > except that you won't have module support anymore). No, that's not what MONOLITHIC is meant to be. It should turn on the options that, in GENERIC, are turned off so as to be modules. So why nobody complained for amd64, which does not provide MONOLITHIC? Because amd64 GENERIC turns modules into builtins, making the necessity of a MONOLITHIC kernel irrelevant. > So, one more time: the current issue is to define what people consider > as options that should be enabled by default, and what could stay out > (or as a third party module if you urgently need it back). One example: > accf_* is a questionable choice, whether its inside GENERIC (as a > builtin), or enabled by default in MONOLITHIC. I don't think that's the issue. Running GENERIC is supposed to make all stable features and functionality available. The question at hand is whether those features are compiled in (which would be "monolithic") or loaded as modules (which would be "modular"). Exactly. [1] http://en.wikipedia.org/wiki/Voting_paradox -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/amd64/conf
one, which is MODULAR. Now why is that? MONOLITHIC was added in a rush to shut up complaints, because people couldn't make a difference between these types: 1 - MONOLITHIC kernel, where options are compiled in (_provided_ you enabled them), with module support disabled 2 - MODULAR kernel, with most common options(7) compiled as "builtin" modules (essentially making it like 1, except "options MODULAR") 3 - MODULAR kernel, with most options(7) compiled as third party modules. Comparatively, the amd64 GENERIC never caused complaints, although it uses modules: critical options(7), like FFS, EXEC_ELF32/64, COMPAT_32, TMPFS/MFS were _builtin_. Please stop opposing MONOLITHIC and MODULAR. Given the current architecture, modules can be turned into builtins, which is, in essence, the same thing as a monolithic kernel, when you turn on most options. I also stated (also privately with Matthew) that "MONOLITHIC" was a bad name. "NOMODULAR" would be more appropriate. If we enable most options in GENERIC, MONOLITHIC would just be a "include <...>/GENERIC\n no options MODULAR", with almost zero difference regarding code (well, except that you won't have module support anymore). So, one more time: the current issue is to define what people consider as options that should be enabled by default, and what could stay out (or as a third party module if you urgently need it back). One example: accf_* is a questionable choice, whether its inside GENERIC (as a builtin), or enabled by default in MONOLITHIC. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/amd64/conf
On 20.02.2011 15:45, matthew green wrote: >> Have you measured how much modules supposedly increase the size compared to >> compiling things directly to the kernel? This seems like a rather silly point >> to me (without numbers, at least). > > well, i dunno about others but i've found that the old modules > lying around tends to fill up space pretty quickly, but ignoring > that problem and looking at recent i386 builds, i see that the > MONOLITHIC kernel set is only 440kb larger than GENERIC, yet the > modules set is 3500kb. > > i didn't look into why, but there's some numbers to go from. (for i386) When I tried to use GENERIC as a replacement for MONOLITHIC for install floppies, I took a look at some of the duplication that happens between "filesys" and "builtin" modules. All in all, there are approx. ~700kiB of modules in /stand that are also accountable (e.g. "builtin") inside GENERIC. Taking the options(4) embedded inside MONOLITHIC and comparing them to their modules counterpart: - COMPAT_* => 400kiB as standalone modules - various fs => 720kiB - other options (PPP, accf_*, pseudo-devices): 200kiB => 1.3MiB. So, a total of 1.3M + 700k = 2MiB. Still missing 1.5MiB. MODULAR options seems to consume ~70kiB , so I would assume that the rest is due to PIC mode and ELF headers... ? That's still 1.5MiB bigger (a complete monolithic kernel vs a full-blown modular kernel). Needs to be xchecked though. Note that /stand also includes modules that cannot be included inside a monolithic kernel, like zfs and solaris compat layer. And these makes up for ~ half the size of /stand (5MiB). -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/amd64/conf
On 19.02.2011 17:13, Matthias Drochner wrote: > > jeanyves.mig...@free.fr said: >> I can't see why MONOLITHIC is needed in the first place > > I think before modular kernels are forced onto the masses, > at least 2 design problems should be fixed: > 1. Autoloading needs to be done differently: The kernel doesn't >have the smarts to know which module is needed in which situation, >and there is no fine-grained administrative control. I've >complained in some other mail about the idiotic behavior >in case of exec format loaders. > 2. I don't want tons of modules which I'll never need installed >into my root file system. As it was common in good old times (tm), >my root filesystems are as small as possible. Now, with modules >being added to the build, I have to squeeze out stuff after >each update to keep some percent free space in /. 1. modules can be turned into builtins, and so you will end up in the same situation as MONOLITHIC, without sacrificing MODULAR. You can preserve monolithic behavior, but still load modules if you want to. 2. issue is the same as with monolithic-like kernels: instead of having code provided as third party files, code is directly embedded in. So what you are gaining as space in one place, you are losing it in another. Smaller file-system, bigger kernel, or the other way around. Granted, kernel does not necessarily reside under / . As said, it's all a matter of choice (which I cannot make). i386 GENERIC was a PITA as all options(4) were modules, thereby affecting boot at the smallest occasion. MONOLITHIC was brought back. On the contrary, amd64 GENERIC was monolithic in nature (most modules are builtins), mimicking i386 MONOLITHIC (but still having modules as possibility), and that did not seem to bother people. So either way, it's fine; but please -- i386 and amd64 should share common grounds. If some want a MONOLITHIC amd64, it's even possible, although I can't see the point given the arguments above. That would also save us a kernel build for i386 release. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/amd64/conf
On 19.02.2011 10:27, Bernd Ernesti wrote: > On Wed, Feb 16, 2011 at 03:16:58AM +0000, Jean-Yves Migeon wrote: >> Module Name: src >> Committed By:jym >> Date:Wed Feb 16 03:16:58 UTC 2011 >> >> Modified Files: >> src/sys/arch/amd64/conf: GENERIC INSTALL >> >> Log Message: >> Build certain file-systems and options(7) as module(7). 32 and 64 bits >> support are still builtin, as well as FFS, NFS, TMPFS, and most COMPAT >> code. Saves approx. 750kiB. >> >> Reflect this in INSTALL kernel, where we have to support more file systems >> statically. >> >> See http://mail-index.netbsd.org/port-amd64/2011/02/13/msg001318.html > > Are you going to add a MONOLITHIC kernel to match i386? No. And honestly, I can't see why MONOLITHIC is needed in the first place: it was introduced as a quick fix for those that wanted to bluntly replace their old kernel with a new one, without risking crippling their system on reboot because ELF and FFS kmods were gone missing. I dare say that MONOLITHIC was a step in the wrong direction: instead of cutting down MODULAR options(4) from GENERIC, it would have been better to include everything as builtin modules, while still offering modular support. I can't see the difference of having MONOLITHIC instead of GENERIC with modules as builtins, except for cases where you want to block module loading, for security purposes. But there, you are better off removing most COMPAT code also, which means a new kernel build, anyway. I perfectly know that the question of "what should be in, and what should stay as a third party kmod?" is entirely debatable. I notified core@ about that, and wait for their answer. We could have the bare minimum builtin inside GENERIC, or provide most options(7) as kmods by default. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/i386/conf
On 13.02.2011 17:02, Paul Goyette wrote: > On Sun, 13 Feb 2011, Jean-Yves Migeon wrote: > >> ... >> For order of preference, see module(7): >> >>The loader will look first for a built-in module with the specified >>name that has not been disabled (see module_unload() below). If a >>built-in module with that name is not found, the list of modules >>prepared by the boot loader is searched. If the named module is >>still not found, an attempt is made to locate the module within the >>file system. > > There should be one additional qualification here. > > Searching for modules within the file system can only occur after > the root file system has been mounted by the initialization code. Sorry, this part is from module(9). For module(7), there is a mention in the CAVEATS section: If an attempt is made to boot the operating system from a file system for which the module is not built into the kernel, the boot may fail with the message ``Cannot mount root, error 79''. On certain architectures (currently, i386 and amd64), one may be able to recover from this error by using the ``load xxxfs'' command before trying to boot. This command is only available on newer bootloaders. Wording seems a bit redundant, so I condensed this into: Index: man/man9/module.9 === RCS file: /cvsroot/src/share/man/man9/module.9,v retrieving revision 1.26 diff -u -p -u -p -r1.26 module.9 --- man/man9/module.9 9 Jan 2011 05:05:10 - 1.26 +++ man/man9/module.9 13 Feb 2011 16:39:01 - @@ -175,7 +175,8 @@ If a built-in module with that .Fa name is not found, the list of modules prepared by the boot loader is searched. If the named module is still not found, an attempt is made to locate the -module within the file system. +module within the file system, provided it has been mounted by the +initialization code. .Pp The .Fa flags -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/i386/conf
On 13.02.2011 14:42, David Laight wrote: > On Sun, Feb 13, 2011 at 04:37:21AM +0000, Jean-Yves Migeon wrote: >> Module Name: src Committed By: jym Date: Sun Feb 13 >> 04:37:21 UTC >> 2011 >> >> Modified Files: src/sys/arch/i386/conf: GENERIC >> >> Log Message: Compile FFS and NFS statically (e.g. not modular) for >> GENERIC. These file-systems can be critical for mountroot; as >> kernel cannot have access to module(7)s without having / mounted >> first... yes, you see the point. > > Does this cause grief with existing installations where the > bootloader also loads these as modules? No; if it does, it should not: the bootloader is free to load whatever modules it wants to, and this happens also when the user specifies it to load certain modules manually. It could print errors messages (although, in my tests, it did not), but should not hamper boot in any way. IMHO, for x86, the kernel cannot assume that the bootloader loaded certain modules, nor can the bootloader expect that kernel XYZ is in a specific configuration. They should be agnostic from one another. Besides, that would be problematic if it were not the case: given that boot(8) can auto-load ffs or nfs kmods, the bootloader would be bound with modular kernels. For order of preference, see module(7): The loader will look first for a built-in module with the specified name that has not been disabled (see module_unload() below). If a built-in module with that name is not found, the list of modules prepared by the boot loader is searched. If the named module is still not found, an attempt is made to locate the module within the file system. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/i386/conf
On 11.02.2011 07:30, Matthias Scheler wrote: > On Fri, Feb 11, 2011 at 05:06:43AM +0100, Jean-Yves Migeon wrote: >> Indeed, it would be good to have at least some exec formats and >> file-systems builtin by default in GENERIC: >> >> EXEC_ELF32, _SCRIPT # obvious >> FFS, CD9660, MFS, TMPFS, NFS, EXT2FS > > FFS should be compiled into the kernel. But I doubt that somebody uses > both MFS and TMPFS. I hardly ever use CD9660 or EXT2FS while I use > NFS a lot. There will however be people with is opposite requirements > which is why those should really be modules. > >> e.g. the typical file-systems we may have to use, without having to rely >> on modules that could be absent or non accessible at boot time. > > I don't think it needs typical file-systems. IMHO the kernel should > contain the file-systems required for booting. This will solve most > of the update issues. This will allow booting, but not necessarily in a sufficient environment to fix update issues: if someone needs to mount an ISO or NFS share to get updated modules, he will need these. Besides, it's more or less the list of file-systems that have associated mount_* in the INSTALL ramdisk; I removed some though: ntfs, kernfs, lfs and msdos. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/i386/conf
On 11.02.2011 02:02, Paul Goyette wrote: > On Fri, 11 Feb 2011, matthew green wrote: >>> I'm not 100% sure it is worth having FFS and ELF support as modules >>> in GENERIC. >>> It may be nice that they CAN be modules, but I suspect 99.99% of systems >>> will need them. >>> Anyone who wants them as modules can build a kernel without them. >> >> i strongly agree with this. > > Me too. Indeed, it would be good to have at least some exec formats and file-systems builtin by default in GENERIC: EXEC_ELF32, _SCRIPT # obvious FFS, CD9660, MFS, TMPFS, NFS, EXT2FS e.g. the typical file-systems we may have to use, without having to rely on modules that could be absent or non accessible at boot time. This has a good chance of making a GENERIC kernel work better out of the box, without big surprises during hand-made updates where kernel gets replaced without its associated modules. IIRC, this is why MONOLITHIC was brought back, due to the number of complaints where GENERIC fails booting because ffs.kmod or exec_elf32.kmod could not be loaded. This is what amd64 is approx. doing, but with most file-systems included by default. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch/i386/conf
On 10.02.2011 22:23, David Laight wrote: > On Thu, Feb 10, 2011 at 04:49:19PM +0000, Jean-Yves Migeon wrote: >> Module Name: src >> Committed By:jym >> Date:Thu Feb 10 16:49:19 UTC 2011 >> >> Modified Files: >> src/sys/arch/i386/conf: INSTALL >> >> Log Message: >> For i386, include MONOLITHIC for INSTALL rather than GENERIC. While here, >> remove drm drivers, we don't need them for install. >> >> i386 GENERIC has FFS and ELF support compiled as modules, so we hit >> an interesting "chicken-egg" situation when the kernel attempts to mount >> a ffs ramdisk, while the module might be contained inside... the ramdisk. > > I'm not 100% sure it is worth having FFS and ELF support as modules > in GENERIC. > It may be nice that they CAN be modules, but I suspect 99.99% of systems > will need them. > Anyone who wants them as modules can build a kernel without them. It's something I mentioned privately with Jared. I think we could come up with a golden mean for GENERIC kernels: leave most "third party" drivers/systems as modules, while keeping critical ones included by default. FFS and ELF come to mind, but there are others too. amd64 GENERIC is close to this. This would open up the possibility to provide a modular kernel, without going the "all or nothing" MONOLITHIC way (and avoid many complaints like "I just replaced my kernel for testing, and it returns an error when attempting to mount / or exec init"). BTW, I wonder whether modules shouldn't be part of the kern-GENERIC.tgz set. But this is an orthogonal issue. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/etc/powerd/scripts
On 31.12.2010 19:51, David Young wrote: > IMO, we should put the system to sleep by sending a > power-saving/wakeup-latency goal and a set of waking events (e.g., > keystroke, mouse movement, LAN activity) to the root device_t using > drvctl. To put any smaller set of devices to sleep, send the goal & > wake events to some subtree. I haven't looked at the details of the ioctl(), but I would expect it to be used against /dev/drvctl, no? This seems plausible to me; at least, suspend/sleep should be part of pmf(9) (even in my Xen's case, all driver sleep/wake coded uses pmf(9)). > FWIW, the sleep states that ACPI names are not sufficient even to > describe all of the potential sleep states of ACPI hardware. I have a > laptop that's perfectly capable of an "S3-like" sleep, but the ACPI BIOS > doesn't support S3, and the HDD is not formatted properly for the S4 > sleep. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/etc/powerd/scripts
On 31.12.2010 11:10, Jukka Ruohonen wrote: > On Fri, Dec 31, 2010 at 11:01:08AM +0100, Jean-Yves Migeon wrote: >> I am using machdep.sleep_state as node to put a domU into suspend mode. >> Up to now, putting sleep_state under machdep allowed powerd(8) >> sleep_button to be used regardless of the environment (eg. ACPI sleep or >> Xen domU sleep). > > So Xen uses a *machine-dependent* sysctl(8) variable for purposes entirely > different from the intended one? machine-dependent: yes for purposes entirely different from the intended one: not really, purpose is arguably the same, put system into "sleep" ("sleep" state being not well defined, I admit). It's only in my local tree though. This can be done in entirely different ways, nothing is set in stone. The side effect of your change is that the sleep_state node will move under hw.acpi, which is not right in Xen domU case. >> While retiring sleep_state from machdep goes in the right direction >> IMHO, will it be replaced by a MI interface to put a system into sleep, >> as it is not a feature specific to ACPI? > > Definitely agreed. Maybe we could steal zzz(8) from APM? Seems reasonable to me. We could have a more featureful binary later, and just alias zzz(8) to it. > Generally, most of the problems ("the mess") in the area of power management > have been directly caused by the lack of proper MI interfaces and the overall > lack of planning. The move towards sysmon_pswitch(9), pmf(9), and powerd(8) > were a step to the right direction; the mistakes done with the i386-specific > apm(4) should not be repeated. Of course not; that's why I am asking. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/etc/powerd/scripts
On 31.12.2010 10:36, Jukka Ruohonen wrote: > Module Name: src > Committed By: jruoho > Date: Fri Dec 31 09:36:15 UTC 2010 > > Modified Files: > src/etc/powerd/scripts: sleep_button > > Log Message: > Use hw.acpi.sleep.state instead of machdep.sleep_state. And so it begins :) I am using machdep.sleep_state as node to put a domU into suspend mode. Up to now, putting sleep_state under machdep allowed powerd(8) sleep_button to be used regardless of the environment (eg. ACPI sleep or Xen domU sleep). While retiring sleep_state from machdep goes in the right direction IMHO, will it be replaced by a MI interface to put a system into sleep, as it is not a feature specific to ACPI? -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/dev/mii
On 27.11.2010 18:38, Jean-Yves Migeon wrote: > Module Name: src > Committed By: jym > Date: Sat Nov 27 17:38:49 UTC 2010 > > Modified Files: > src/sys/dev/mii: miidevs.h > > Log Message: > Correct string for BCM6709S. s/BCM6709S/BCM5709S/ -- Jean-Yves Migeon j...@netbsd.org
Re: CVS commit: src/sys/dev/pci
On 10.10.2010 23:24, Christos Zoulas wrote: > Module Name: src > Committed By: christos > Date: Sun Oct 10 21:24:35 UTC 2010 > > Modified Files: > src/sys/dev/pci: agp.c > > Log Message: > restore binary compatibility for amd64; requested by joerg. Thanks; I was about to commit the fix. Second time I got burned on that one :/ -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src
On 06.10.2010 12:16, Manuel Bouyer wrote: > On Tue, Oct 05, 2010 at 11:48:17PM +0000, Jean-Yves Migeon wrote: >> [...] >> >> XXX Currently, savecore(8) will fail to dump a PAE kernel in a !PAE >> environment (and reciprocally). So you need to sync and reboot >> with a kernel of the same mode as the one that crashed. Once the dump >> is successful, this does not matter anymore. > > Doesn't it work with savecore -N /the_right_kernel ? No; in fact, dumplo is complete garbage, so savecore fails dumping the image. I suspect that the KREAD() in savecore.c does not read at the correct offset, and just sets garbage for dumplo. I have to see what code path takes savecore with -N, and where it differs from the case where it goes through getbootfile(3) when the booted kernel is different from the one that is about to get dumped. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys
On 27.09.2010 23:25, Christos Zoulas wrote: > Module Name: src > Committed By: christos > Date: Mon Sep 27 21:25:39 UTC 2010 > > Modified Files: > src/sys/dev/pci: agp.c > src/sys/sys: agpio.h > > Log Message: > backwards compat code for paddr_t being 32 bits. > > Thanks! -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On Tue, 17 Aug 2010 18:18:22 +0100, David Laight wrote: >> Unifying sizes of certain types up to 64-bits would be a good choice >> from engineering standpoint. Actually measuring the overhead would >> answer whether it is worth. > > Why not use a 64bit type that is the union of a 64bit number and two > 32bit numbers. > A kernel that uses 32bit paddr_t would use tha approriate (for the > endianness) 32bit value, having initialised the unused half to zero. > > That would remove almost everything except the structure size overhead Which is precisely the "problem" here :) > while maintaining binary compatibility. I don't think so; modules that manipulates 32 bits bus_addr_t will likely fail with PAE when you cross the 4GB boundary. I don't see how the union could solve that. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On Tue, 17 Aug 2010 14:26:53 +, Andrew Doran wrote: > Hi, > > Why are any types other than in the pmap different between PAE and !PAE? > > When this was originally proposed I asked for stuff like paddr_t to be 64 > bits no matter what the kernel compile settings where. > > Thanks. I need more time! Let's do this /gradually/. There are already regressions with PAE that needs fixing before moving further, especially "bugs" that people reported to me I cannot reproduce with my hosts (a.out breakage with Xen PAE + NX, and invalid TLB entries that only happens 'randomly' with some AMD 64 Athlon CPUs). That, and also: - rmind@ branch work on uvm/pmap. Such a change will interfere with his work. - the issue is wider, bus_addr_t will move to "all 64 bits" too (performance impact?) - p{d,t}_entry_t vs paddr_t size mix up in pmap, which is bad in the !PAE case (32 vs 64 bits). Breaking i386 pmap will not be particularly good for my life expectancy. - I want to merge xensuspend before entering another bug hunting session :) One at a time is enough :) - this is orthogonal to the kvm(3) issue: having 64 bits in-kernel paddr_t will not automagically solve the "all PAs are unsigned long" assumption of kvm(3). -- Jean-Yves Migeon jeanyves.mig...@free.fr
re: CVS commit: src/sys/arch
On Tue, 17 Aug 2010 09:59:54 +1000, matthew green wrote: >> For kernel core dumps, at the present time, no; this needs thinking, and >> testing: >> - kvm(3) API will have to move from using 'unsigned long' to 'paddr_t'. I >> am currently checking that for all architectures except i386, >> sizeof(paddr_t) == sizeof(u_long). > > sparc and 32 bit sparc64 kernels use 64 bit paddr_t. there are some > mips ports that do too. you can't assume the above equality at all, > not since "paddr_t" was introduced a decade+ ago. Eeek. I can't see how I can fix that then. Would it be acceptable to say "the affected functions are private to kvm(3), so don't bother with this anyway?" >> - i386 will use 64 bits paddr_t for kvm(3), so we will have to discuss on z> how to make it work without too much fuzz with pre-64 bits i386 dumps. > > i think this seems the sanest way. make userland see the 64 bit > definition in all cases, and have the libkvm code DTRT. Thing is, kvm(3) has two functions (kvm_kvatop and kvm_pa2off) that use PAs internally and assume they are unsigned long. I can't move i386 to 64 bits PA without having to tweak kvm_private.h prototypes, which will affect all ports in turn. IMHO, the path of least resistance would be to move to paddr_t. >> - sparse dumps for memory above 4GB. >> >> Eventually, PAE and !PAE kernels core files should be handled by kvm, if >> that's what you are asking. > > right. i was suggesting that if you simply used nlist(3) to find out the > value of "i386_use_pae" then the same code will work for live debugging > as well as core dumps. > > since core file debugging basically requires this method, and it works > just as well for live debugging, the sysctl seems like something we don't > need to do. I insist: the sysctl(7) was never meant to be used for checking whether PAE is enabled for "live" kvm(3) session. I admit, my commit message is not very clear: the " Will be used by i386 kvm(3) to know the functions that should [...]" sentence applies to i386_use_pae, not machdep.pae. It is only there for convenience, eg. when someone wants to know easily if its system is running under PAE. -- Jean-Yves Migeon jeanyves.mig...@free.fr
re: CVS commit: src/sys/arch
On Tue, 17 Aug 2010 06:55:21 +1000, matthew green wrote: >> >> Log Message: >> >> Add machdep.pae sysctl(7) for i386. Thanks to Paul and Joerg for their >> >> reviews. >> >> >> >> In kernel, it matches the 'i386_use_pae' variable (0: kernel does not >> >> use >> >> PAE, 1: kernel uses PAE). Will be used by i386 kvm(3) to know the >> >> functions that should get called for VA => PA translations. >> > >> > does this work for core files as well? > > could you answer this part? thanks. For kernel core dumps, at the present time, no; this needs thinking, and testing: - kvm(3) API will have to move from using 'unsigned long' to 'paddr_t'. I am currently checking that for all architectures except i386, sizeof(paddr_t) == sizeof(u_long). - i386 will use 64 bits paddr_t for kvm(3), so we will have to discuss on how to make it work without too much fuzz with pre-64 bits i386 dumps. - sparse dumps for memory above 4GB. Eventually, PAE and !PAE kernels core files should be handled by kvm, if that's what you are asking. PAE does not affect program core files. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 16.08.2010 22:22, matthew green wrote: >> Module Name: src >> Committed By:jym >> Date:Mon Aug 16 19:39:06 UTC 2010 >> >> Modified Files: >> src/sys/arch/i386/i386: machdep.c >> src/sys/arch/x86/include: cpu.h >> >> Log Message: >> Add machdep.pae sysctl(7) for i386. Thanks to Paul and Joerg for their >> reviews. >> >> In kernel, it matches the 'i386_use_pae' variable (0: kernel does not use >> PAE, 1: kernel uses PAE). Will be used by i386 kvm(3) to know the functions >> that should get called for VA => PA translations. > > does this work for core files as well? > > if not, wouldn't this feature be better done use ugly kvm/nlist so > that it works with the same code on dead kernels? getting a 0/1 > value via kvm shouldn't really be considered any real problem. That's the purpose; I don't know of any other way. The sysctl is only there for practical purposes: cpuctl(8) can indicate that the CPU supports PAE, but there is no easy way to know if it is active or not (unless playing with config -x together with the booted kernel, not very practical...). Hence, the sysctl machdep. > (why the extern in cpu.h? i doesn't seem necessary.) Yes; I wanted to place it at the same level as i386_use_fxsave. i386_use_pae may get use elsewhere eventually, so I added it to cpu.h. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 24.07.2010 20:47, matthew green wrote: >> On 24.07.2010 09:50, Christoph Egger wrote: >> Not necessarily, the kvm(3) API manipulates PA as u_long (see >> _kvm_kvatop in kvm_i386.c). Changing the paddr_t will need a >> modification of this API too, and basically, all ports will have to move >> to uint64_t PA (or put casts everywhere...) > > ah, i thought that it already used paddr_t for this. it should, imo. > > and i'd think that libkvm should be compiled with 64bit paddr_t (ie, > have the useland definition of paddr_t always be 64 bit) and be smart > enough to deal with the kernel you feed it. 32 bit sparc has to deal > with this a lot, since the page tables are considerably more different > from sparc v8/v9 than i386 vs amd64. That would be a solution, yes. FWIW, I am making all PAE related macros and types prefixed with "pae_" (~ some kind of namespace), and alias the relevant function with them in _KERNEL when option PAE is enabled. For kvm(3), both could be used; I could extract a value from a symbol like "pae_enabled" out of a core file through kvm_nlist, then use the relevant vatop functions. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 24.07.2010 21:00, matthew green wrote: >> On 24.07.2010 17:54, Matthias Drochner wrote: >>> >>> c...@cubidou.net said: >>>> Kernel modules, on the other hand... >>> >>> Hmm, didn't think of that. (using them myself only for testing) >>> >>>> a gaping ABI incompatibility is completely unacceptable >>> >>> There are two ways to fix this without the 64-bit-paddr overhead, >>> a short-term and a long-term one. >>> The short-term fix would be to use another module load path. >>> This is close to calling it a different port, but it would >>> not affect userland. >> >> "i386pae" and "i386"? Huh... then we will get i386xenpae, i386xen, and >> so on? > > FWIW, having a different module load path and choosing the right set > of modules is an incoming feature but i have no time line for it. > > i plan for it to be useful for ppc oea vs. 4xx, sparc32 vs sparc64 > in 32 bitmode, and there was someone else wanting it too. IMHO, it makes sense for these platforms. But here, i386 and PAE are so close in concept that it feels like a quirk (at least to me). -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 24.07.2010 18:40, Antti Kantee wrote: > On Sat Jul 24 2010 at 15:23:50 +, Quentin Garnier wrote: >>> PAE is a "bridge technology", interesting only for few systems. >>> Users of low-end i386 systems would be penalized, and most users >>> of boxes with >4G memory would gain nothing because they run >>> a 64-bit system anyway. >>> So, imho, no kernel overhead is acceptable. >> >> Well, that's one opinion. My own, probably just as humble, is that such >> a gaping ABI incompatibility is completely unacceptable, especially >> after all the work that has been done to make some kernel subsystems a >> little bit more responsible regarding ABI. >> >> I'm really curious to see some actual measurements about that alleged >> overhead. The way I see it right now, we have a known lethal ABI >> incompatibility versus an alleged overhead of unknown size. > > Didn't anyone else read the mail from a week or so ago containing detailed > measurements of the overhead? It measures overhead of PAE, not turning paddr_t to 64 bits on !PAE i386. I don't think that paddr_t moving to 64 bits add much overhead. I would rather incriminate TLB flushes and 64 bits atomic ops... > (I'm not 100% sure if I believe the results without further analysis, > but at least there are benchmarks) I am not sure that benchmarking is a matter of believing :) I could make something out of the phoronix tests [1] and track performance regressions on a revision (or week) basis, atf-style, but as always, the hardest part about benchmarks is interpretation, not running them (once everything is in place) While sysbench announces 20% less memory bandwidth, build.sh runs have no more than 3% overhead with PAE. So, hmm, between one specific measurement and real world use... there is quite a gap. -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 24.07.2010 17:54, Matthias Drochner wrote: > > c...@cubidou.net said: >> Kernel modules, on the other hand... > > Hmm, didn't think of that. (using them myself only for testing) > >> a gaping ABI incompatibility is completely unacceptable > > There are two ways to fix this without the 64-bit-paddr overhead, > a short-term and a long-term one. > The short-term fix would be to use another module load path. > This is close to calling it a different port, but it would > not affect userland. "i386pae" and "i386"? Huh... then we will get i386xenpae, i386xen, and so on? > A more correct but more expensive fix would be to keep out > paddr_t from the kernel ABI relevant to modules. Then bus_addr_t and paddr_t should be split. I do not think that there are many modules that make use of paddr_t-bound variables; in itself, it should remain within uvm and pmap. My biggest concern is that there is no real "stable KPI" we could rely on. rmind@ is reworking the pmap/uvm code in its branch, but told me (in a private mail) that it may take some time before he merges it. So I asked if I could commit PAE first and come back to it later. A long term fix would be to reuse (and probably extend) the modular world, and have some kind of "one kernel fits all" possibility for x86, because paddr_t/PD/PT handling also differs with Xen. But making that work in a safe and consistant manner, without breaking ABI, needs more thought that I currently could put in my patch. Cheers, -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 24.07.2010 09:50, Christoph Egger wrote: >> XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This >> cannot be solved easily, and needs lots of thinking before being declared >> safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions). > > How about making paddr_t always 64bit? That makes it much easier to deal > with in libkvm. Not necessarily, the kvm(3) API manipulates PA as u_long (see _kvm_kvatop in kvm_i386.c). Changing the paddr_t will need a modification of this API too, and basically, all ports will have to move to uint64_t PA (or put casts everywhere...) > paddr_t being always 64bit wide also allows to switch PAE on and off at > runtime like Solaris does. And *off* at runtime? That's one requirement, however you would have tons of extra work to make pmap "dynamically" switch from !PAE to PAE, due to paddr_t, pd/pt, bus_addr_t modifications. That would need a modular pmap, and load that module very early. IMHO, "if it ever gets done", we should add Xen to the loop (and fix that module path issue). So that we would have only one kernel for DOM0, GENERIC and GENERIC PAE. Besides, I suspect that turning paddr_t to uint64_t will add overhead, even if the upper 32 bits are always 0 in GENERIC. > Pleaes fix the amd64 build error reported on current-us...@. > The build error is related to rump. Investigating. rumptest just finished for i386 and amd64, and no error whatsoever. Guess I'll have to dig further... -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys/arch
On 24.07.2010 12:30, Manuel Bouyer wrote: >> How about making paddr_t always 64bit? That makes it much easier to deal >> with in libkvm. > > The overhead would need to be evaluated first. > Also, I'm not sure this would fix all the libkvm issues (the page table > format is still different). For the page table format, the PAE types and macros could be redefined with a prefix, and then aliased to the ones used in kernel when "options PAE" is defined. kvatop/pa2off functions could have access to both macros/typedefs, by just having pae_ prefixed in front (pae_pl*_pi, pae_pd_entry_t, and so on). It becomes a matter of calling the proper code within kvm(3), by checking if PAE was enabled within the kernel dump (through kvm_nlist, for example). -- Jean-Yves Migeon jeanyves.mig...@free.fr
Re: CVS commit: src/sys
On 04/19/10 02:02, matthew green wrote: XXX Should kernel rev be bumped? did you change any ABIs? ie, will any old modules of the same version now fail to load due to your change, or fail to work properly? No it seems unlikely, but i didn't read the big diff. It fixes a bug that could be labeled as security-sensitive (not exploitable directly though). Bumping revision makes it easier to say "-current under rev 5.99.x is affected, 5.99.x+1 is not". -- Jean-Yves Migeon jeanyves.mig...@free.fr
CVS commit: src/sys/arch/xen/x86
Module Name:src Committed By: jym Date: Tue Mar 9 23:12:06 UTC 2010 Modified Files: src/sys/arch/xen/x86: xen_bus_dma.c Log Message: Although Xen's documentation states that the address_bits field is not used by XENMEM_decrease_reservation, it is checked by the hypervisor. In certain circumstances (stack leak), the field could have an improper value, leading to a fail of the hypercall. Set it to 0 ("no addressing restriction") to avoid that. Patch tested by Sam Fourman and h...@. This should fix the rare "failed allocating DMA memory" encountered under NetBSD dom0. Will ask for a pull-up. To generate a diff of this commit: cvs rdiff -u -r1.19 -r1.20 src/sys/arch/xen/x86/xen_bus_dma.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Modified files: Index: src/sys/arch/xen/x86/xen_bus_dma.c diff -u src/sys/arch/xen/x86/xen_bus_dma.c:1.19 src/sys/arch/xen/x86/xen_bus_dma.c:1.20 --- src/sys/arch/xen/x86/xen_bus_dma.c:1.19 Tue Mar 2 00:13:50 2010 +++ src/sys/arch/xen/x86/xen_bus_dma.c Tue Mar 9 23:12:06 2010 @@ -1,4 +1,4 @@ -/* $NetBSD: xen_bus_dma.c,v 1.19 2010/03/02 00:13:50 jym Exp $ */ +/* $NetBSD: xen_bus_dma.c,v 1.20 2010/03/09 23:12:06 jym Exp $ */ /* NetBSD bus_dma.c,v 1.21 2005/04/16 07:53:35 yamt Exp */ /*- @@ -32,7 +32,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: xen_bus_dma.c,v 1.19 2010/03/02 00:13:50 jym Exp $"); +__KERNEL_RCSID(0, "$NetBSD: xen_bus_dma.c,v 1.20 2010/03/09 23:12:06 jym Exp $"); #include #include @@ -95,6 +95,7 @@ xenguest_handle(res.extent_start) = &mfn; res.nr_extents = 1; res.extent_order = 0; + res.address_bits = 0; res.domid = DOMID_SELF; error = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &res); if (error != 1) {
CVS commit: src/sys/arch/xen/x86
Module Name:src Committed By: jym Date: Tue Mar 9 23:12:06 UTC 2010 Modified Files: src/sys/arch/xen/x86: xen_bus_dma.c Log Message: Although Xen's documentation states that the address_bits field is not used by XENMEM_decrease_reservation, it is checked by the hypervisor. In certain circumstances (stack leak), the field could have an improper value, leading to a fail of the hypercall. Set it to 0 ("no addressing restriction") to avoid that. Patch tested by Sam Fourman and h...@. This should fix the rare "failed allocating DMA memory" encountered under NetBSD dom0. Will ask for a pull-up. To generate a diff of this commit: cvs rdiff -u -r1.19 -r1.20 src/sys/arch/xen/x86/xen_bus_dma.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
CVS commit: src/sys/arch/xen/x86
Module Name:src Committed By: jym Date: Wed Mar 3 00:09:03 UTC 2010 Modified Files: src/sys/arch/xen/x86: cpu.c Log Message: Use roundup2() instead of hardcoding the CACHE_LINE_SIZE rounding operation. To generate a diff of this commit: cvs rdiff -u -r1.41 -r1.42 src/sys/arch/xen/x86/cpu.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Modified files: Index: src/sys/arch/xen/x86/cpu.c diff -u src/sys/arch/xen/x86/cpu.c:1.41 src/sys/arch/xen/x86/cpu.c:1.42 --- src/sys/arch/xen/x86/cpu.c:1.41 Wed Feb 24 22:37:55 2010 +++ src/sys/arch/xen/x86/cpu.c Wed Mar 3 00:09:03 2010 @@ -1,4 +1,4 @@ -/* $NetBSD: cpu.c,v 1.41 2010/02/24 22:37:55 dyoung Exp $ */ +/* $NetBSD: cpu.c,v 1.42 2010/03/03 00:09:03 jym Exp $ */ /* NetBSD: cpu.c,v 1.18 2004/02/20 17:35:01 yamt Exp */ /*- @@ -66,7 +66,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: cpu.c,v 1.41 2010/02/24 22:37:55 dyoung Exp $"); +__KERNEL_RCSID(0, "$NetBSD: cpu.c,v 1.42 2010/03/03 00:09:03 jym Exp $"); #include "opt_ddb.h" #include "opt_multiprocessor.h" @@ -236,8 +236,7 @@ aprint_naive(": Application Processor\n"); ptr = (uintptr_t)kmem_zalloc(sizeof(*ci) + CACHE_LINE_SIZE - 1, KM_SLEEP); - ci = (struct cpu_info *)((ptr + CACHE_LINE_SIZE - 1) & - ~(CACHE_LINE_SIZE - 1)); + ci = (struct cpu_info *)roundup2(ptr, CACHE_LINE_SIZE); ci->ci_curldt = -1; } else { aprint_naive(": %s Processor\n", @@ -372,8 +371,7 @@ aprint_naive(": Application Processor\n"); ptr = (uintptr_t)kmem_alloc(sizeof(*ci) + CACHE_LINE_SIZE - 1, KM_SLEEP); - ci = (struct cpu_info *)((ptr + CACHE_LINE_SIZE - 1) & - ~(CACHE_LINE_SIZE - 1)); + ci = (struct cpu_info *)roundup2(ptr, CACHE_LINE_SIZE); memset(ci, 0, sizeof(*ci)); #ifdef TRAPLOG ci->ci_tlog_base = kmem_zalloc(sizeof(struct tlog), KM_SLEEP);
CVS commit: src/sys/arch/xen/x86
Module Name:src Committed By: jym Date: Wed Mar 3 00:09:03 UTC 2010 Modified Files: src/sys/arch/xen/x86: cpu.c Log Message: Use roundup2() instead of hardcoding the CACHE_LINE_SIZE rounding operation. To generate a diff of this commit: cvs rdiff -u -r1.41 -r1.42 src/sys/arch/xen/x86/cpu.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
CVS commit: src/sys/arch/xen/x86
Module Name:src Committed By: jym Date: Tue Mar 2 00:13:50 UTC 2010 Modified Files: src/sys/arch/xen/x86: xen_bus_dma.c Log Message: Catch the return value from the XENMEM_decrease_reservation hypercall, and not some error value stored earlier. While here, fix a typo in a comment. To generate a diff of this commit: cvs rdiff -u -r1.18 -r1.19 src/sys/arch/xen/x86/xen_bus_dma.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
CVS commit: src/sys/arch/xen/x86
Module Name:src Committed By: jym Date: Tue Mar 2 00:13:50 UTC 2010 Modified Files: src/sys/arch/xen/x86: xen_bus_dma.c Log Message: Catch the return value from the XENMEM_decrease_reservation hypercall, and not some error value stored earlier. While here, fix a typo in a comment. To generate a diff of this commit: cvs rdiff -u -r1.18 -r1.19 src/sys/arch/xen/x86/xen_bus_dma.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Modified files: Index: src/sys/arch/xen/x86/xen_bus_dma.c diff -u src/sys/arch/xen/x86/xen_bus_dma.c:1.18 src/sys/arch/xen/x86/xen_bus_dma.c:1.19 --- src/sys/arch/xen/x86/xen_bus_dma.c:1.18 Sat Feb 27 09:22:40 2010 +++ src/sys/arch/xen/x86/xen_bus_dma.c Tue Mar 2 00:13:50 2010 @@ -1,4 +1,4 @@ -/* $NetBSD: xen_bus_dma.c,v 1.18 2010/02/27 09:22:40 jym Exp $ */ +/* $NetBSD: xen_bus_dma.c,v 1.19 2010/03/02 00:13:50 jym Exp $ */ /* NetBSD bus_dma.c,v 1.21 2005/04/16 07:53:35 yamt Exp */ /*- @@ -32,7 +32,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: xen_bus_dma.c,v 1.18 2010/02/27 09:22:40 jym Exp $"); +__KERNEL_RCSID(0, "$NetBSD: xen_bus_dma.c,v 1.19 2010/03/02 00:13:50 jym Exp $"); #include #include @@ -81,7 +81,7 @@ npagesreq = (size >> PAGE_SHIFT); KASSERT(npages >= npagesreq); - /* get npages from UWM, and give them back to the hypervisor */ + /* get npages from UVM, and give them back to the hypervisor */ error = uvm_pglistalloc(((psize_t)npages) << PAGE_SHIFT, 0, avail_end, 0, 0, mlistp, npages, (flags & BUS_DMA_NOWAIT) == 0); if (error) @@ -96,8 +96,8 @@ res.nr_extents = 1; res.extent_order = 0; res.domid = DOMID_SELF; - if (HYPERVISOR_memory_op(XENMEM_decrease_reservation, &res) - != 1) { + error = HYPERVISOR_memory_op(XENMEM_decrease_reservation, &res); + if (error != 1) { #ifdef DEBUG printf("xen_alloc_contig: XENMEM_decrease_reservation " "failed: err %d (pa %#" PRIxPADDR " mfn %#lx)\n",
CVS commit: src/sys/arch
Module Name:src Committed By: jym Date: Mon Mar 1 01:35:11 UTC 2010 Modified Files: src/sys/arch/amd64/amd64: machdep.c src/sys/arch/i386/i386: machdep.c Log Message: Do not forget that ptoa() casts the result to vaddr_t, which is bad for paddr_t values under i386 PAE. Use ctob() instead. Although amd64 is not affected by this vaddr_t vs paddr_t issue (both having the same size), for the sake of completeness, switch to ctob() when manipulating paddr_t/psize_t entities in amd64 machdep.c. Compile tested for i386 and amd64. No regression expected. To generate a diff of this commit: cvs rdiff -u -r1.142 -r1.143 src/sys/arch/amd64/amd64/machdep.c cvs rdiff -u -r1.683 -r1.684 src/sys/arch/i386/i386/machdep.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Modified files: Index: src/sys/arch/amd64/amd64/machdep.c diff -u src/sys/arch/amd64/amd64/machdep.c:1.142 src/sys/arch/amd64/amd64/machdep.c:1.143 --- src/sys/arch/amd64/amd64/machdep.c:1.142 Mon Feb 8 19:02:26 2010 +++ src/sys/arch/amd64/amd64/machdep.c Mon Mar 1 01:35:11 2010 @@ -1,4 +1,4 @@ -/* $NetBSD: machdep.c,v 1.142 2010/02/08 19:02:26 joerg Exp $ */ +/* $NetBSD: machdep.c,v 1.143 2010/03/01 01:35:11 jym Exp $ */ /*- * Copyright (c) 1996, 1997, 1998, 2000, 2006, 2007, 2008 @@ -107,7 +107,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.142 2010/02/08 19:02:26 joerg Exp $"); +__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.143 2010/03/01 01:35:11 jym Exp $"); /* #define XENDEBUG_LOW */ @@ -906,7 +906,7 @@ if ((error = cpu_dump()) != 0) goto err; - totalbytesleft = ptoa(cpu_dump_mempagecnt()); + totalbytesleft = ctob(cpu_dump_mempagecnt()); blkno = dumplo + cpu_dumpsize(); dump = bdev->d_dump; error = 0; @@ -1151,7 +1151,7 @@ for (x = 0; x < vm_nphysseg; x++) { vps = &vm_physmem[x]; - if (ptoa(vps->avail_end) == avail_end) + if (ctob(vps->avail_end) == avail_end) break; } if (x == vm_nphysseg) @@ -1159,12 +1159,12 @@ /* Shrink so it'll fit in the last segment. */ if ((vps->avail_end - vps->avail_start) < atop(sz)) - sz = ptoa(vps->avail_end - vps->avail_start); + sz = ctob(vps->avail_end - vps->avail_start); vps->avail_end -= atop(sz); vps->end -= atop(sz); msgbuf_p_seg[msgbuf_p_cnt].sz = sz; -msgbuf_p_seg[msgbuf_p_cnt++].paddr = ptoa(vps->avail_end); +msgbuf_p_seg[msgbuf_p_cnt++].paddr = ctob(vps->avail_end); /* Remove the last segment if it now has no pages. */ if (vps->start == vps->end) { @@ -1176,7 +1176,7 @@ for (avail_end = 0, x = 0; x < vm_nphysseg; x++) if (vm_physmem[x].avail_end > avail_end) avail_end = vm_physmem[x].avail_end; - avail_end = ptoa(avail_end); + avail_end = ctob(avail_end); if (sz == reqsz) return; @@ -1293,7 +1293,7 @@ /* Determine physical address space */ avail_start = first_avail; - avail_end = ptoa(xen_start_info.nr_pages); + avail_end = ctob(xen_start_info.nr_pages); pmap_pa_start = (KERNTEXTOFF - KERNBASE); pmap_pa_end = avail_end; __PRINTK(("pmap_pa_start 0x%lx avail_start 0x%lx avail_end 0x%lx\n", Index: src/sys/arch/i386/i386/machdep.c diff -u src/sys/arch/i386/i386/machdep.c:1.683 src/sys/arch/i386/i386/machdep.c:1.684 --- src/sys/arch/i386/i386/machdep.c:1.683 Mon Mar 1 01:15:23 2010 +++ src/sys/arch/i386/i386/machdep.c Mon Mar 1 01:35:11 2010 @@ -1,4 +1,4 @@ -/* $NetBSD: machdep.c,v 1.683 2010/03/01 01:15:23 jym Exp $ */ +/* $NetBSD: machdep.c,v 1.684 2010/03/01 01:35:11 jym Exp $ */ /*- * Copyright (c) 1996, 1997, 1998, 2000, 2004, 2006, 2008, 2009 @@ -67,7 +67,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.683 2010/03/01 01:15:23 jym Exp $"); +__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.684 2010/03/01 01:35:11 jym Exp $"); #include "opt_beep.h" #include "opt_compat_ibcs2.h" @@ -1188,7 +1188,7 @@ vps = NULL; for (x = 0; x < vm_nphysseg; ++x) { vps = &vm_physmem[x]; - if (ptoa(vps->avail_end) == avail_end) { + if (ctob(vps->avail_end) == avail_end) { break; } } @@ -1197,12 +1197,12 @@ /* Shrink so it'll fit in the last segment. */ if (vps->avail_end - vps->avail_start < atop(sz)) - sz = ptoa(vps->avail_end - vps->avail_start); + sz = ctob(vps->avail_end - vps->avail_start); vps->avail_end -= atop(sz); vps->end -= atop(sz); msgbuf_p_seg[msgbuf_p_cnt].sz = sz; - msgbuf_p_seg[msgbuf_p_cnt++].paddr = ptoa(vps->avail_end); + msgbuf_p_seg[msgbuf_p_cnt++].paddr = ctob(vps->avail_end); /* Remove the last segment if it now has no pages. */ if (vps->start == vps->end) { @@ -1214,7 +1214,7 @@ for (avail_end = 0, x = 0; x < vm_nphysseg; x++) if (vm_physmem[x].avail_end > avail_end) avail_end = vm_physmem[x].avail_end; - avail_end = ptoa(avail_end); + avail_end = ctob(avail_end); if (sz == reqsz) return; @@ -1386,7 +1386,7 @@ /* Make sure the end of the space used by the kernel is rounded. */ first_av
CVS commit: src/sys/arch
Module Name:src Committed By: jym Date: Mon Mar 1 01:35:11 UTC 2010 Modified Files: src/sys/arch/amd64/amd64: machdep.c src/sys/arch/i386/i386: machdep.c Log Message: Do not forget that ptoa() casts the result to vaddr_t, which is bad for paddr_t values under i386 PAE. Use ctob() instead. Although amd64 is not affected by this vaddr_t vs paddr_t issue (both having the same size), for the sake of completeness, switch to ctob() when manipulating paddr_t/psize_t entities in amd64 machdep.c. Compile tested for i386 and amd64. No regression expected. To generate a diff of this commit: cvs rdiff -u -r1.142 -r1.143 src/sys/arch/amd64/amd64/machdep.c cvs rdiff -u -r1.683 -r1.684 src/sys/arch/i386/i386/machdep.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
CVS commit: src/sys/arch/i386
Module Name:src Committed By: jym Date: Mon Mar 1 01:15:24 UTC 2010 Modified Files: src/sys/arch/i386/i386: machdep.c rbus_machdep.c src/sys/arch/i386/include: rbus_machdep.h Log Message: Change rbus_min_start_hint() semantic for i386. "ram" is now psize_t (instead of size_t) to avoid possible overflow when system may have more than 4GB of memory (like PAE). The behavior of rbus_min_start_hint() remains the same. While here, fix printf's format strings (paddr_t => PRIxPADDR). Use ctob() and cast physmem to psize_t to avoid losing bits above 4GB. Comes from PAE patch from Jeremy Morse; adaptation by me. Compile tested for GENERIC only. No regression expected. To generate a diff of this commit: cvs rdiff -u -r1.682 -r1.683 src/sys/arch/i386/i386/machdep.c cvs rdiff -u -r1.24 -r1.25 src/sys/arch/i386/i386/rbus_machdep.c cvs rdiff -u -r1.8 -r1.9 src/sys/arch/i386/include/rbus_machdep.h Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
CVS commit: src/sys/arch/i386
Module Name:src Committed By: jym Date: Mon Mar 1 01:15:24 UTC 2010 Modified Files: src/sys/arch/i386/i386: machdep.c rbus_machdep.c src/sys/arch/i386/include: rbus_machdep.h Log Message: Change rbus_min_start_hint() semantic for i386. "ram" is now psize_t (instead of size_t) to avoid possible overflow when system may have more than 4GB of memory (like PAE). The behavior of rbus_min_start_hint() remains the same. While here, fix printf's format strings (paddr_t => PRIxPADDR). Use ctob() and cast physmem to psize_t to avoid losing bits above 4GB. Comes from PAE patch from Jeremy Morse; adaptation by me. Compile tested for GENERIC only. No regression expected. To generate a diff of this commit: cvs rdiff -u -r1.682 -r1.683 src/sys/arch/i386/i386/machdep.c cvs rdiff -u -r1.24 -r1.25 src/sys/arch/i386/i386/rbus_machdep.c cvs rdiff -u -r1.8 -r1.9 src/sys/arch/i386/include/rbus_machdep.h Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Modified files: Index: src/sys/arch/i386/i386/machdep.c diff -u src/sys/arch/i386/i386/machdep.c:1.682 src/sys/arch/i386/i386/machdep.c:1.683 --- src/sys/arch/i386/i386/machdep.c:1.682 Mon Feb 8 19:02:29 2010 +++ src/sys/arch/i386/i386/machdep.c Mon Mar 1 01:15:23 2010 @@ -1,4 +1,4 @@ -/* $NetBSD: machdep.c,v 1.682 2010/02/08 19:02:29 joerg Exp $ */ +/* $NetBSD: machdep.c,v 1.683 2010/03/01 01:15:23 jym Exp $ */ /*- * Copyright (c) 1996, 1997, 1998, 2000, 2004, 2006, 2008, 2009 @@ -67,7 +67,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.682 2010/02/08 19:02:29 joerg Exp $"); +__KERNEL_RCSID(0, "$NetBSD: machdep.c,v 1.683 2010/03/01 01:15:23 jym Exp $"); #include "opt_beep.h" #include "opt_compat_ibcs2.h" @@ -481,7 +481,7 @@ #if NCARDBUS > 0 /* Tell RBUS how much RAM we have, so it can use heuristics. */ - rbus_min_start_hint(ptoa(physmem)); + rbus_min_start_hint(ctob((psize_t)physmem)); #endif minaddr = 0; Index: src/sys/arch/i386/i386/rbus_machdep.c diff -u src/sys/arch/i386/i386/rbus_machdep.c:1.24 src/sys/arch/i386/i386/rbus_machdep.c:1.25 --- src/sys/arch/i386/i386/rbus_machdep.c:1.24 Tue Dec 15 22:17:12 2009 +++ src/sys/arch/i386/i386/rbus_machdep.c Mon Mar 1 01:15:24 2010 @@ -1,4 +1,4 @@ -/* $NetBSD: rbus_machdep.c,v 1.24 2009/12/15 22:17:12 snj Exp $ */ +/* $NetBSD: rbus_machdep.c,v 1.25 2010/03/01 01:15:24 jym Exp $ */ /* * Copyright (c) 1999 @@ -26,7 +26,7 @@ */ #include -__KERNEL_RCSID(0, "$NetBSD: rbus_machdep.c,v 1.24 2009/12/15 22:17:12 snj Exp $"); +__KERNEL_RCSID(0, "$NetBSD: rbus_machdep.c,v 1.25 2010/03/01 01:15:24 jym Exp $"); #include "opt_pcibios.h" #include "opt_pcifixup.h" @@ -82,10 +82,10 @@ * or 2046 vs 2048. */ void -rbus_min_start_hint(size_t ram) +rbus_min_start_hint(psize_t ram) { #ifdef RBUS_MIN_START_FORCED - aprint_debug("rbus: rbus_min_start from config at 0x%0lx\n", + aprint_debug("rbus: rbus_min_start from config at %#0" PRIxPADDR "\n", rbus_min_start); #else if (ram <= 192*1024*1024UL) { @@ -113,7 +113,7 @@ rbus_min_start = 3 * 1024 * 1024 * 1024UL; } - aprint_debug("rbus: rbus_min_start set to 0x%0lx\n", + aprint_debug("rbus: rbus_min_start set to %#0" PRIxPADDR "\n", rbus_min_start); #endif } Index: src/sys/arch/i386/include/rbus_machdep.h diff -u src/sys/arch/i386/include/rbus_machdep.h:1.8 src/sys/arch/i386/include/rbus_machdep.h:1.9 --- src/sys/arch/i386/include/rbus_machdep.h:1.8 Tue Dec 15 22:17:12 2009 +++ src/sys/arch/i386/include/rbus_machdep.h Mon Mar 1 01:15:24 2010 @@ -1,4 +1,4 @@ -/* $NetBSD: rbus_machdep.h,v 1.8 2009/12/15 22:17:12 snj Exp $ */ +/* $NetBSD: rbus_machdep.h,v 1.9 2010/03/01 01:15:24 jym Exp $ */ /* * Copyright (c) 1999 @@ -40,6 +40,6 @@ rbus_tag_t rbus_pccbb_parent_io(struct pci_attach_args *); rbus_tag_t rbus_pccbb_parent_mem(struct pci_attach_args *); -void rbus_min_start_hint(size_t); +void rbus_min_start_hint(psize_t); #endif /* _ARCH_I386_I386_RBUS_MACHDEP_H_ */
CVS commit: src/sys/arch/i386/include
Module Name:src Committed By: jym Date: Mon Mar 1 00:55:33 UTC 2010 Modified Files: src/sys/arch/i386/include: pmap.h Log Message: Use PDP_SIZE for NTOPLEVEL_PDES (number of top level PDEs) instead of #ifdef'ing PAE. To generate a diff of this commit: cvs rdiff -u -r1.104 -r1.105 src/sys/arch/i386/include/pmap.h Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. Modified files: Index: src/sys/arch/i386/include/pmap.h diff -u src/sys/arch/i386/include/pmap.h:1.104 src/sys/arch/i386/include/pmap.h:1.105 --- src/sys/arch/i386/include/pmap.h:1.104 Tue Feb 9 22:51:13 2010 +++ src/sys/arch/i386/include/pmap.h Mon Mar 1 00:55:33 2010 @@ -1,4 +1,4 @@ -/* $NetBSD: pmap.h,v 1.104 2010/02/09 22:51:13 jym Exp $ */ +/* $NetBSD: pmap.h,v 1.105 2010/03/01 00:55:33 jym Exp $ */ /* * @@ -279,11 +279,7 @@ #define NKL2_START_ENTRIES 0 /* XXX computed on runtime */ #define NKL1_START_ENTRIES 0 /* XXX unused */ -#ifdef PAE -#define NTOPLEVEL_PDES (PAGE_SIZE * 4 / (sizeof (pd_entry_t))) -#else -#define NTOPLEVEL_PDES (PAGE_SIZE / (sizeof (pd_entry_t))) -#endif +#define NTOPLEVEL_PDES (PAGE_SIZE * PDP_SIZE / (sizeof (pd_entry_t))) #define NPDPG (PAGE_SIZE / sizeof (pd_entry_t))